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measurement  error,  and  (c)  lagged  values  of  one  or  more  dependent  variables 
are  used  as  predictors,  A major  objective  for  the  review  of  these  particular 
applications  of  2SLS  is  to  demonstrate  how  salient  problems  in  psychology  can 
be  addressed  by  use  of  structural  equations. 
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A procedure,  two-stage  least  squares  (2SLS),  for  analyzing  structural 
equations  when  one  or  more  explanatory  variables  are  correlated 
with  the  error  or  disturbance  term  Is  reviewed.  A brief  introduction  to 
structural  equations  and  the  use  of  ordinary  least  squares  in  causal  analysis 
is  presented  initially.  This  is  followed  by  an  introduction  to  2SLS  and  the 
application  of  2SLS  to  designs  in  which  (a)  two  or  more  variables  are  reci- 
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procal  causes  of  each  other,  (b)  one  or  more  variables  contain  random 
measurement  error,  and  (c)  lagged  values  of  one  or  more  dependent  variables 
are  used  as  predictors.  A major  objective  for  the  review  of  these  particular 
applications  of  2SLS  is  to  demonstrate  how  salient  problems  in  psychology 
can  be  addressed  by  use  of  structural  equations. 


Applications  of  Two-Stage  Least  Squares  in  Causal 
Analysis  and  Structural  Equations 


A procedure,  two-stage  least  squares  (2SLS),  for  estimating  structural 

» «*- 

parameters  when  one  or  more  explanatory  variables  are  correlated 
with  the  error  or  "disturbance"  term  is  reviewed.  A brief  introduction  to 
structural  equations  and  the  use  of  ordinary  least  squares  in  causal  analysis 
is  presented  initially.  This  is  followed  by  an  Introduction  to  2SLS  and  the 
application  of  2SLS  to  designs  in  which  (a)  two  or  more  variables  are  reci- 
procal causes  of  each  other,  (b)  one  or  more  variables  contain  random 
measurement  error,  and  (c)  lagged  values  of  one  or  more  dependent  variables 
are  used  as  predictors,  which  focuses  on  the  analysis  of  the  cross-lagged 
panel  correlation  design  in  terms  of  structural  equations.  The  above 
applications  were  selected  because  they  were  presumed  to  be  of  interest  to 
psychologists.  They  are  not,  however,  exhaustive  of  the  applications  of 
2SLS. 


STRUCTURAL  EQUATIONS  AND  CAUSAL  ANALYSIS 
A structural  equation  refers  to  the  "representation  of  the  true  struc- 
tural or  cauaal  properties  of  real-world  phenomena,  as  contrasted  with  equa- 
tions that  are  merely  used  for  prediction  or  estimation  purposes"  (Namboodiri, 
Carter,  & Blalock,  1975,  p.  448).  Some  recent  efforts  in  psychology  have 
attempted  to  acquaint  psychologists  with  causal  inferences  based  on  recursive 
structural  equations  (cf.  Feldman,  1975;  Kalleberg  4 Kluegel,  1975;  Kenny, 
1975;  Kerlinger  & Pedhazur,  1973;  Sims  & Szllagyi,  1975;  Werts  & Linn,  1970). 

A recursive  design  is  one  in  which  the  hypothesized  causal  relationships  are 
unidirectional  or  asymmetric,  as  demonstrated  in  Figure  1.  A recursive 
model  requires  that  in  a causal  sequence  each  Xj  must  precede  each  X^,  where 
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ji  < J_,  and  that  X^  may  be  caused  either  directly  or  indirectly  by  each  , 
but  that  an  X^  cannot  be  caused  by  an  X^.  Thus,  causal  closure  implies 
that  the  relationships  are  asymmetric  in  a recursive  or  unidirectional 
causal  model. 


Insert  Figure  1 about  here 

Recursive  models  are  applicable  to  numerous  types  of  data  sets  (c f. 
Werts  & Linn,  1970),  but  there  exist  many  designs  for  which  they  are  not. 
Examples  where  recursive  models  would  not  suffice  are  provided  by  social 
system  theory  (cf.  Indik,  1968;  James  & Jones,  1976;  Katz  & Kahn,  1966; 
Lichtman  & Hunt,  1971;  Sells,  1963,  1968)  and  interactional  paradigms  (cf. 
Bowers,  1973;  Ekehammar,  1974;  Endler,  1975;  Endler  & Magnusson,  1976), 

These  theories  project  complex  models  that  incorporate  feedback  and  recipro- 
cal causation  or  simultaneity  (Goldberger,  1973;  Singh  & Williams,  1972; 
Singh,  1975),  and  include  reciprocal  interactions  between  individuals  and 
situations  over  time  (Overton  & Reese,  1973).  Another  example  is  a postu- 
late of  system  theory  which  states  that  all  events  are  correlated,  thus 
obscuring  unidirectional  cause-effect  relationships  (Katz  & Kahn,  1966). 
These  theories  raise  obvious  questions  for  structural  analyses  that  employ 
recursive  models  and  their  accompanying  structural  equations.  Rather,  non- 
recurslve  models  that  Incorporate  reciprocal  causation  are  required.  An 
example  of  a nonrecurslve  model  is  presented  in  Figure  2.  In  this  model, 
an  X^  may  be  caused  directly  or  indirectly  by  several  Xj,  including  cases 
where  i_  < J_. 


r 
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Insert  Figure  2 about  here 


It  will  be  emphasized  throughout  this  paper  that  the  structural  equa- 
tions representing  recursive  or  nonrecursive  structural  models,  or  variations 
of  these  models  such  as  block- recursive  models  (cf.  Fisher,  1966),  can  be 
formulated  in  terms  of  the  general  linear  model.  When  viewed  in  this  manner, 
structural  equations  may  be  employed  to  test  causal  hypotheses  for  a multi- 
tude of  designs,  including  (a)  truly  experimental  designs  involving  randomi- 
zation and  intervention  (cf.  Miller,  1971),  (b)  quasi-experimental  designs, 
and  (c)  cross-sectional  (static)  correlational  designs  involving  data 

obtained  from  natural  settings.  Applications  of  structural  equations  to 
t 

different  designs  are  of  course  based  upon  different  assumptions,  some  of 
which  may  not  be  testable.  Furthermore,  the  assumptions  are  of  primary 
importance  because  simply  casting  analyses  in  terms  of  structural  equations 
does  not  guarantee  that  one  is  actually  testing  causal  hypotheses. 

Our  attention  here  is  focused  on  applications  of  structural  equations 
to  correlational  designs  which  employ  "passive  data"  (Cook  & Campbell,  1976) 
obtained  on  either  static  or  longitudinal  bases.  It  is  important  to  empha- 
size that  the  application  of  structural  equations  to  correlational  data, 
particularily  from  cross-sectional  designs,  does  not  represent  an  attempt 
to  identify  an  unambiguous,  unique  causal  model.  Rather,  it  represents  a 
method  for  examining  the  logical  consistency  of  alternative  causal  hypotheses 
and  models,  and  for  rejecting  those  that  are  untenable  (Duncan,  1966; 

Feldman,  1975;  Goldberger,  1973;  Kerlinger  & Pedhazur,  1973;  Werts  & Linn, 
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1970).  When  there  are  a priori,  untestable  assumptions  or  different  structural 
equations  that  fit  the  data  equally  well,  a single,  correct  set  of  struc- 
tural equations  cannot  be  ascertained.  However,  untenable  causal  models  that 
do  not  fit  the  data  may  be  discarded.  Thus,  the  application  of  structural 
equations  to  correlational  data  does  not  represent  an  attempt  "to  accomplish 
the  impossible  task  of  deducing  causal  relations  from  the  values  of  correla- 
tion coefficients"  (Wright,  1934,  in  Duncan,  1966,  p.  15),  but  an  attempt  to 
examine  whether  any  alternative  theoretical  models  accepted  for  possible 
causal  interpretation  are  logically  consistent  with  the  data. 

This  should  not  be  construed  to  mean  that  one  is  necessarily  "making 
causal  inferences  based  on  correlational  data",  which  is  both  ambiguous  and 
an  overstatement  (Duncan,  1975,  p.  47).  On  the  other  han&,  the  unfortunate 
result  of  rejecting  the  consideration  of  causal  relations  in  correlational 
designs  has  frequently  led  to  descriptive  rather  than  explanatory  interpre- 
tations of  data,  or  even  worse,  to  degradation  of  research  designs  to  the 
extent  that  subgoals  such  as  maximizing  validity  coefficients  in  the  absence 
of  explanatory  theory  have  become  ends  in  themselves.  By  contrast,  thinking 
in  terms  of  structural  equations  and  causality  with  nonexperimental  data 
requires  a strong  theoretical  base  and  has  definite  advantages.  Among  these 
are  an  emphasis  on  explanation  rather  than  description  (Strotz  & Wold,  1971), 
estimation  of  change  rather  than  fixed  values  (Namboodiri  et  al.,  1975),  and 
a patteroof  interpretation  that  makes  explicit  the  rationale  and  assump- 
tions underlying  analytical  procedures  while  simultaneously  forcing  the 
discussion  of  results  to  be  at  least  internally  consistent  (Duncan,  1966). 

Thus,  explanatory  theories  encompassing  change  and  causality,  and  the  goodness 

fcLi - J 
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of  fit  of  data  to  such  theories,  become  the  primary  focus  of  structural 
equations.  If  nothing  else,  such  an  approach  forces  investigators  to  place 
primary  concern  on  theoretical  issues. 

To  state  the  matter  directly,  it  is  possible  to  adopt  causal,  struc- 
tural equation  models  with  correlational  data  based  on  natural  observations 
if  one  is  willing  to  make  assumptions  regarding  prior  time  sequencing  of 
variables  and  relationships  among  variables  (some  of  which  may  not  be  test- 
able) and  statistical  assumptions  (some  of  which  may  also  not  be  testable). 
The  fact  that  economists,  and  more  recently  sociologists,  have  been  routinely 
employing  such  models  is  a case  in  point.  The  adoption  of  such  models  and 
accompanying  assumptions  provides  the  basis  for  the  remainder  of  this  paper. 

STRUCTURAL  EQUATIONS  AND  THE  USE  OF  ORDINARY  LEAST  SQUARES 

Structural  equations  are  forms  of  the  general  linear  model,  and  when 
appropriate,  may  be  represented  as  ordinary  least  squares  (OLS)  multiple 
regression  equations,  where  standardized  or  unstandardized  (partial)  regres- 
sion weights  provide  unbiased  and  consistent  estimates,  based  on  a sample, 
of  the  population  causal  or  structural  parameters.  That  is,  when  regression 
weights  are  employed  as  structural,  rather  than  just  statistical,  parameters, 
then  reference  is  being  made  to  causal,  real-world  phenomena  (Duncan,  1975; 
Heise,  1975;  Namboodiri  et  al.,  1975).  For  example,  an  unstandardized 
regression  weight  employed  as  an  estimate  of  a structural  parameter  provides 
an  Indication  of  "the  mean  change  in  the  dependent  variable  expected  to 
result  for  each  unit  of  change  in  one  particular  independent  variable, 
assuming  other  independent  variables  are  held  constant"  (Darlington  & Rom, 
1972,  p.  452).  On  the  other  hand,  it  is  possible  to  demonstrate  several 
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sources  of  error  which  result  in  inconsistent  and  biased  parameter  estimation 
if  OLS  is  employed.  The  use  of  OLS  for  parameter  estimation  is  discussed 
here  briefly.  This  is  followed  by  a discussion  of  conditions  which  preclude 
the  use  of  OLS,  and  necessitate  the  use  of  other  procedures,  particularly 
2SLS. 

In  general,  in  a causal  system  one  is  usually  addressing  several  depen- 
dent variables.  If  a structural  equation  for  each  dependent  variable  is 
delineated,  a system  of  simultaneous  equations  is  obtained.  Generally,  the 
system  of  simultaneous  equations  may  be  viewed  as  a system  of  multiple 
regression  equations,  where  the  direct  causal  factors  for  each  dependent 
variable  are  considered  predictors.  If  one  (structural)  equation  is  selec- 
ted from  this  system,  it  would  take  the  general  form  (assuming  linearity  and 
additivity) 


Y„  - a + b , X,  + 
Kl  1 


+ ^V'-'+Vi  + i (1) 


where  Y represents  the  dependent  variable,  a represents  the  inter- 


cept,  X 


. represent  the  predictors  (raw  scores),  b^  . . . b^ 


represent  the  unstandardized  regression  coefficients  for  the  X^,  and 


d represents  the  disturbance  term.  (Disturbance  terms  are  synony- 


mous with  error  terms,  and  include  variance  resulting  from  sampling 
error  as  well  as  effects  of  unknown  or  unmeasured  outside  influences 
such  as  measurement  error  and  variables  that  assist  in  causal  explana- 
tion but  are  omitted  from  the  theory  and/or  measurement) . 


In  this  context,  the  coefficients  represent  sample  estimates  of 


population  structural  parameters.  It  will  be  assumed  that  the  predictors 
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are  random  variables  and  that  the  samples  are  random  and  large.  Given 
these  conditions,  one  of  the  primary  concerns  is  whether  the  provide 
consistent  estimates  of  the  population  structural  parameters.  It  will  be 
assumed  that  if  random  samples  of  increasing  size  (n)  are  selected,  then 
estimates  of  the  structural  parameters  (bR^)  will  converge  to  the  popula- 
tion structural  parameters  (B^) . This  is  referred  to  as  a probability 
limit , or  "plim",  and  is  designated  by:  plim  b^fl)  - B (Johnston,  1972), 

which  connotes  that  b will  converge  to  B in  the  limit . Consistent  estima- 
tors may  or  may  not  be  asymptotically  unbiased,  although  in  most  applications 
consistent  estimators  also  tend  to  be  asymptotically  unbiased  estimators 
(Pindyck  & Rubinfeld,  1976,  p.  24). 

The  assumptions  required  to  apply  OLS  to  equation  1 to  obtain  estimates 

of  the  structural  parameters  include  linearity,  additivity,  interval  scales, 

and  random  sampling.  It  is  also  generally  assumed  that  the  disturbance  terms 

2 

are  distributed  as  N (0,  o,  ).  Additional  assumptions  required  for  at  least 
— Q 

consistent  estimation  are: 

(1)  the  predictors  have  no  measurement  error  (random  or  nonrandom) , 

(2)  the  disturbance  terms  from  different  structural  equations  are 
uncorrelated,  and 

(3)  the  predictors  are  uncorrelated  in  the  limit  with  the  disturbance 
term.  This  assumption  may  be  represented  as  plim  [ (1/n)  X dj  = 0 
(cf.  Christ,  1966;  Johnston,  1972;  Theil,  1971),  and  implies  that 
any  variable  which  is  causally  connected  to  the  dependent  variable 
but  has  been  left  out  of  the  structural  equation  is  not  causally 
connected  to  any  of  the  predictors  (cf.  Blalock,  Wells,  & Carter, 
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1970).  That  is,  the  effects  of  an  omitted  variable  will  be  part  of 
the  disturbance  term,  therefore  resulting  in  a correlation  between 
the  disturbance  term  and  any  predictor  causally  related  with  the 
omitted  variable.  From  another  perspective,  this  assumption  implies 
that  all  major  causes  of  the  dependent  variable  are  included  in  the 
structural  equation. 

Any  violation  of  the  above  assumptions  may  lead  to  biased  and  inconsistent 
estimates  of  the  structural  parameters  if  OLS  is  employed.  Of  primary  impor- 
tance is  the  last  assumption,  which  is  almost  impossible  to  meet  in  empirical 
research.  That  is,  unless  the  theoretical  causal  system  is  completely  known 
and  all  predictors  measured  with  perfect  reliability,  the  last  assumption  will 
ordinarily  be  violated  and  some  degree  of  inconsistency  and  bias  will  be 
Introduced  into  the  structural  parameter  estimates.  Problems  associated  with 
omitted  variables  will  be  addressed  in  more  detail  later;  however,  at  this 
time  other  conditions  resulting  in  correlations  between  predictors  and  distur- 
bance terms,  and  the  necessity  for  using  procedures  other  than  OLS  for  con- 
sistent parameter  estimation,  will  be  discussed. 

The  first  of  these  conditions  involves  a situation  where  one  or  more  of 
the  predictors  is  in  fact  a dependent  variable  in  another  equation  in  the 
system  (e.g.,  « Y + i>,  and  the  two  dependent  variables  (Y  and  Y ) 

are  reciprocally  related.  At  least  part  of  the  equation  system  is  then 
nonrecursive,  the  variable  that  is  reciprocally  related  to  the  criterion  is 
correlated  with  the  disturbance  term,  and  therefore  OLS  is  not  appropriate 
for  parameter  estimation.  The  second  condition  arises  when  one  or  more  of 
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the  predictors  include  random  measurement  errors.  This  condition  also 
results  in  a correlation  between  the  predictors  measured  with  error  and  the 
disturbance  term,  and  again  OLS  is  not  appropriate  for  parameter  estimation. 
Finally,  if  one  of  the  predictors  is  in  fact  a lagged  value  of  the  dependent 
variable  (e.g.,  " Ygt  _ ^),  and  the  disturbance  terms  are  serially  corre- 

lated over  time,  then  the  lagged  dependent  variable  will  be  correlated  with 
the  disturbance  term  in  the  limit  and  OLS  should  again  not  be  employed. 

Each  of  the  above  three  conditions  is  addressed  separately  in  this 
report.  It  will  be  shown  how  2SLS,  or  modified  2SLS,  can  be  applied  to  the 
structural  equations  to  obtain  at  least  consistent  estimates  of  the  struc- 
tural parameters  (assuming  that  other  assumptions  are  met).  These  applica- 
tions are  discussed  in  the  following  order:  (a)  analysis  of  nonrecursive 

structural  equations  and  an  introduction  to  2SLS,  (b)  the  application  of 
2SLS  to  structural  equations  which  include  predictors  that  have  random 
measurement  error,  and  (c)  the  application  of  a modified  version  of  2SLS  to 
structural  equations  which  include  lagged  dependent  variables  and  serially 
correlated  disturbances. 

THE  ANALYSIS  OF  NONRECURSIVE  STRUCTURAL  EQUATIONS 
AND  AN  INTRODUCTION  TO  2SLS 
LorIc  of  Nonrecursive  Models 

A nonrecursive  model  is  one  in  which  two  or  more  variables  to  be 
explained  by  the  model  (i.e. , dependent  variables)  are  mutually  dependent  and 
reciprocal  causes  of  one  another.  It  is  also  assumed  that  the  mutual  effects 
of  the  two  or  more  variables  in  mutual  interaction  are  relatively  rapid  or  at 
least  the  time  lags  are  short  and  cannot  be  meaningfully  identified  nor 


i 


Two-Stage  Least  Squares 


10 

measured  (cf.  Namboodiri  et  al.,  1975)  (If  meaningful  time  lags  are  identi- 
fiable and  measureable,  then  the  model  can  be  treated  as  recursive).  For 
exemplary  purposes,  we  shall  assume  that  the  models  presented  here  incorporate 
only  cross-sectional,  correlational  data  based  on  natural  observations,  and 
that  the  structural  equations  are  algebraic  in  form.  It  is  also  assumed 
that  the  relationships  am^ug  the  mutually  interacting  variables  are  stable, 
or  have  reached  an  equilibrium  - type  condition  (Miller,  1971);  this  is  dis- 
cussed in  more  detail  below. 

The  nonrecursive  model  selected  for  this  discussion  is  presented  in 
Figure  3.  In  this  figure,  the  three  variables  labeled  by  a Y (i.e.,  Y^,  Y^ , 
Y^)  are  endogenous  variables.  Endogenous  variables  are  dependent  measures 
that  are  to  be  explained  by  the  theory  or  model.  For  example,  as  shown  in 
Figure  3 each  of  the  endogenous  variables  is  dependent  upon  each  of  the  other 
endogenous  variables  through  a system  of  reciprocal  relationships.  Each 
endogenous  variable  is  also  dependent  upon  one  or  more  predetermined  vari- 
ables, which  are  represented  by  the  variables  labeled  by  an  X (i.e.,  X^,  X^, 
X^,  X^).  In  general,  predetermined  variables  consist  of  (a)  lagged  values 
of  the  endogenous  variables,  and  (b)  exogenous  variables,  which  are  lagged 
or  non-lagged  variables  that  are  considered  to  be  separate  causes  of  the 
endogenous  variables.  The  predetermined  variables  are  treated  as  "givens" 
and  are  assumed  to  provide  explanatory  power  to  the  nodel  but  are  not  them- 
selves to  be  explained  by  the  model.  Moreover,  they  are  not  dependent  on 
the  endogenous  variables,  and  are  treated  as  predictors  or  independent 
variables.  Because  Figure  3 is  a cross-sectional  model,  the  predetermined 
variables  (X^)  consist  only  of  non-lagged  exogenous  variables. 
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Insert  Figure  3 about  here 


The  curved  lines  among  the  exogenous  variables  in  Figure  3 connote  that 
relationships  exist  among  these  measures.  Curved  lines  also  indicate  rela- 
tionships that  are  not  explained  by  the  model.  As  shown  in  Figure  3,  X^,  X2, 
and  are  intercorrelated,  but  is  not  correlated  with  any  of  the  other 
exogenous  variables.  The  arrows  in  the  model,  both  from  exogenous  variables 
to  endogenous  variables  and  among  the  endogenous  variables  (i.e.,  reciprocal 
relationships),  represent  the  causal  inferences.  Associated  with  each  arrow 
is  an  unstandardized  regression  coefficient  or  structural  parameter  (e.g., 
bi£).  Finally,  each  endogenous  variable  has  associated  with  it  a disturbance 
term,  which  is  designated  by  small  "d"  and  subscripted  by  a numeral  corres- 
ponding to  the  numeral  of  the  endogenous  variable. 

Returning  briefly  to  the  assumption  of  equilibrium,  it  is  assumed  that 
for  each  subject  the  mutual  effects  of  the  three  endogenous  variables  (Y^, 

Y^,  Y3^  °n  eac^  other  have  reached  a state  of  stability,  and  the  levels  of 
each  of  the  variables  are  constant  for  each  individual  subject.  It  is 
further  assumed  that  the  levels  of  the  exogenous  variables  are  temporally 
fixed  for  each  subject,  that  the  effects  of  the  exogenous  variables  on  the 
endogenous  variables  have  been  relatively  rapid,  and  that  the  structural 
model  is  appropriate  for  all  members  of  the  sample  (and  population) . It  is 
then  possible  to  conduct  the  analysis  by  employing  comparisons  across  sub- 
jects to  infer  processes  that  have  been  at  work  within  subjects  (Namboodiri 
et  al.,  1975). 2 
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Statistics  of  Nonrecurslve  Models  — An  Introduction  to  2SLS 

A number  of  statistical  procedures  are  available  for  the  analysis  of  non- 
recursive models,  including  indirect  least  squares,  2SLS,  three-stage  least 
squares,  and  limited-information  and  full-information  maximum  likelihood 
functions.  We  shall  focus  here  on  2SLS,  which  has  been  shown  to  be  applicable 
to  social  science  data  (cf.  Duncan,  1970;  Duncan,  Haller,  & Portes,  1968,  1971; 
Kohn  & Schooler,  1973;  Mason  & Halter,  1968;  Miller,  1971;  Namfccodiri  et  al., 
1975;  Waite  & Stolzenberg,  1976),  and  is  generally  considered  to  be  as  powerful 
as  more  sophisticated  methods  (cf.  Christ,  1966;  Johnston,  1972;  Theil,  1971; 
King,  Note  1).  The  introduction  to  2SLS  begins  with  a discussion  of  identi- 
fication, proceeds  to  statistical  assumptions,  analytical  procedures,  and 
tests  for  goodness  of  fit,  and  concludes  with  a summary  in  which  an  applica- 
tion of  a nonrecursive  model  and  2SLS  is  proposed  for  a current  psychological 
research  problem. 

Two-stage  least  squares,  developed  separately  by  Baseman  (1957)  and  Theil 
(1953a, b),  is  a simultaneous  equation  estimation  method  in  which  the  estimation 
of  structural  parameters  is  conducted  independently  for  each  equation  in  the 
system.  However,  2SLS  cannot  proceed  without  first  addressing  the  question 
of  identification,  which  involves  determination  of  whether  sufficient  informa- 
tion exists  to  estimate  the  unknown  structural  parameters  of  the  structural 
equations  (Theil,  1971).  It  should  be  noted  that  structural  parameters  and 
structural  equations  are  population  terms;  in  the  presentation  below,  however, 
we  continue  to  use  the  term  structural  equations  for  data  based  on  random 
samples,  but  differentiate  between  population  structural  parameters  and  their 
sample  estimates. 
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Identification 

Theil  (1971,  pp.  44C,  449)  has  succintly  defined  identification  in  the 
following  manner 

. . . when  we  have  a complete  linear  system  of  L equations,  the 
parameters  of  the  J^th  equation  are  not  estimable  when  there  exists 
a linear  combination  of  the  other  JL  - 1 equations  that  does  not  con- 
tain any  of  the  variables  of  the  system  which  do  not  occur  in  the 
^th  equation;  or,  to  put  it  in  more  positive  terms,  the  parameters 
of  the  j^th  equation  are  not  estimable  when  there  exists  a linear 
combination  of  the  other  equations  that  contains  only  the  variables 
which  do  occur  in  the  j^th  equation,  and  possibly  fewer.  In  that 
case  the  j^th  equation  is  said  to  be  not  identifiable  (or  underiden- 
tified)  in  its  system. 

In  the  general  case,  and  using  present  terminology,  the  identification 
of  an  equation  rests  on  meeting  two  conditions,  which  are  the  order  condi- 
tion and  the  rank  condition.  With  respect  to  the  order  condition,  there  will 
be  a system  of  G structural  equations  representing  G endogenous  variables. 

To  these  equations  will  be  added  K predetermined  variables,  which  as  noted 
earlier  will  consist  only  of  non-lagged  exogenous  variables  in  the  present 
examples.  For  Identification  purposes,  each  equation  must  have  G - 1 vari- 
ables deleted  from  the  total  possible  G + K variable  set,  where  the  deleted 
variables  may  be  either  endogenous  or  exogenous.  If  the  number  of  variables 
deleted  from  an  equation  is  equal  to  G - 1,  the  equation  is  exactly  identified. 
If  the  number  of  variables  deleted  is  greater  than  G - 1,  the  equation  is 
overldentlf led . Otherwise,  the  equation  is  under identified,  and  no  solution 
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exists.  It  should  be  noted  that  within  a set  of  G equations,  some  equations 
may  be  exactly  identified  while  others  may  be  overidentified  (or  even  under- 
identified)  . 

While  the  order  condition  is  necessary,  it  is  not  sufficient  for  the 
identification  of  an  equation.  A necessary  and  sufficient  condition  is  the 
rank  condition  (cf.  Fisher,  1966).  A discussion  of  the  rank  condition 
Involves  considerable  mathematical  complexity,  and  thus  such  discussion  was 
included  in  a technical  appendix  to  this  paper  (Appendix  B) , However,  it  is 
generally  the  case  that  if  the  order  condition  is  met,  the  rank  condition  is 
also  met  (Namboodirl  et  al.,  1975).  An  exception  occurs  when  the  structural 
equations  for  two  or  more  endogenous  variables  contain  the  same  combinations 
of  variables. 

It  is  important  to  note  that  the  selection  and  addition  of  exogenous 
(or  more  generally,  predetermined)  variables  to  the  structural  equations 
should  be  based  on  sound  theory,  and  that  trivial  variables  should  not  be 
added  to  the  equations  solely  for  identification  purposes  (Duncan,  1975). 

The  criteria  for  selection  of  exogenous  variables  are  that  a)  the  hypothe- 
sized direct  effects  should  be  significant  and  substantial,  b)  the  hypothesized 
indirect  effects  should  be  significant  and  substantial,  and  c)  the  exogenous 
variables  are  not  dependent  on  the  endogenous  variables  at  the  time  of  the 
study. 

Statistics  of  2SLS 

The  use  of  2SLS  is  illustrated  by  an  example,  using  the  nonrecursive 


model  presented  in  Figure  3. 
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Design  of  the  structural  equations.  The  first  step  in  2SLS  is  to  write 
out  the  structural  equations  for  each  of  the  endogenous  variubles.  The 
structural  equation  for  each  endogenous  variable  includes  those  endogenous 
and  exogenous  variables  and  their  (estimated)  parameters  that  have  a direct 
relationship  (i.e. , arrows  in  Figure  3)  with  the  endogenous  variable,  plus 


the  disturbance  term, 
deviation  form,  are 

The  structural  equations  for  Figure 

ll  “ 
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where  the  b . (g  i1  h)  represent  estimates  of  the  structural  parameters 
gh  — 

for  the  mutually  interacting  endogenous  variables,  the  represent 
estimates  of  the  structural  parameters  for  the  exogenous  variables, 
and  the  d represent  disturbance  terms. 

. a 

Order  condition.  With  respect  to  the  order  condition  required  (but  not 
sufficient)  for  identification,  G * 3,  K « 4,  and  G + K = 7.  The  first 
equation,  equation  2,  is  exactly  identified  because  G - 1 = 2 variables  have 
been  deleted  from  the  equation  (i.e.,  there  are  five  variables,  including  the 
dependent  variable,  in  the  equation).  Equations  3 and  4 are  overidentified 
because  more  than  two  variables  have  been  deleted  from  each  equation. 

Statistical  assumptions.  The  statistical  assumptions  underlying  the 
2SLS  procedure  are: 

1)  The  causal  effects  are  linear  and  additive. 

2)  Variables  have  been  measured  on  Interval  scales. 
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3)  The  Independent  variables  have  no  random  nor  nonrandom  measurement 
error,  which  In  the  above  model  would  mean  that  all  variables  should 
be  perfectly  reliable  because  each  endogenous  variable  Is  used  as  an 
Independent  variable  (as  well  as  a dependent  variable) . 

4)  The  exogenous  variables  (X^)  are  uncorrelated  with  the  disturbance 

terms  (d  ) in  the  limit  (i.e.,  plim  [ (1/n)  X , d - 0]).  As  noted 
-ft  Jib 

earlier,  this  implies  that  all  major  causes  of  the  dependent  vari- 
ables have  been  ascertained. 

5)  £ (dR)  ■ 0,  and  the  disturbances  are  normally  distributed  (an  assump- 
tion that  allows  the  use  of  statistical  tests  [cf,  Johnston,  1972]). 

6)  The  sample  selected  is  random  if  drawn  from  a finite  population. 

An  additional  assumption  is  that  the  variables  are  ordered  correctly.  This 
is  similar  to  the  identification  question  (i.e.,  some  variables  are  deleted 
from  each  equation)  and  implies  that  selection  of  variables  is  based  on 
theory  and  hopefully  involves  previous  research  (Duncan,  1975).  Violations 
of  the  above  assumptions  are  generally  referred  to  as  specification  errors 
(cf.  Spaeth,  1975). 

Of  importance  here  is  the  omission  of  the  assumptions  associated  with 
recursive  models  and  OLS  that  all  variables  in  an  equation  be  uncorrelated 
in  the  limit  with  the  disturbance  term  of  that  equation,  and  that  the  distur- 
bance terms  of  different  equations  be  uncorrelated.  In  nonrecursive  models, 
the  endogenous,  but  not  the  exogenous,  variables  are  assumed  to  be  correlated 
with  disturbance  terms,  and  the  disturbance  terms  may  also  be  correlated. 

The  rationale  for  these  conditions  is  that  the  mutual  interactions  among  a 
set  of  endogenous  variables  result  in  influences  on  each  endogenous  variable 
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by  the  disturbance  terms  of  the  other  endogenous  variables  (cf.  Johnston, 

1972,  p.  343).  That  is,  the  disturbance  terms  include  variance  representing 
reciprocal  causes  of  the  endogenous  variables  which  are  reciprocally  related 
(Namboodiri  et  al. , 1975) . 

With  respect  to  the  assumptions  of  2SLS,  the  absence  of  measurement  error 
or  the  restriction  of  the  equations  to  variables  measured  by  interval  scales 
night  appear  to  be  overly  confining  to  psychologists.  Fortunately,  models 
are  available  for  guiding  analysis  when  some  of  the  assumptions  cannot  be 
met.  Models  developed  for  random  measurement  error  are  discussed  in  the 
following  section  of  this  paper,  and  the  reader  is  teferred  to  Namboodiri  et 
al.  (1975)  for  a discussion  of  models  which  involve  nonrandom  measurement 
error.  With  respect  to  interval  scales,  while  some  authors  have,  argued  that 
ordinal  scales  will  suffice  for  parametric  purposes  (cf.  Bohrnstedt  & Carter, 
1971;  Spaeth,  1975),  it  appears  reasonable  to  require  that  the  scales  be 
"essentially  interval"  (i.e.,  while  perhaps  not  perfectly  interval,  the 
scales  should  possess  interval  qualities  and  be  regarded  as  substantially 
better  than  ordinal).  This  rationale  is  based  on  the  causal  interpretation 
of  a structural  parameter  presented  earlier,  where  one  would  presume  that  a 
unit  increase  in  the  predictor  should  connote  (approximately)  uniform  degrees 
of  change  throughout  the  range  of  the  scale  (the  same  is  true  for  dependent 
variables).  On  the  other  hand,  paradigms  are  available  for  the  use  of  ordi- 
nal and  nominal  scales  in  structural  equations  (cf.  Boyle,  1970;  Namboodiri 
et  al.,  1975).  Such  paradigms  are  closely  associated  with  present  statistical 
knowledge  in  psychology  Inasmuch  as  structural  equations  are  a form  of  the 
general  linear  model. 


■I A 
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Violations  of  assumptions  regarding  linearity  and  additivity  have  also 
been  addressed  (cf.  Darlington  & Rom,  1972;  Namboodlri  et  al. , 1975;  Werts 
& Linn,  1970).  Because  structural  equations  are  forms  of  the  general 
linear  model,  polynomial  regression  and  the  use  of  cross-products  to  repre- 
sent interaction  terms  are  a^llcable,  although  there  are  some  questions  as 
to  whether  these  procedures  can  be  employed  with  random  variables  (Sockloff, 
1976).  Moreover,  the  addition  to  the  structural  equations  of  squared,  cubed, 
etc. , terms  and  possible  moderators  and  cross-products  involving  moderators 
(cf.  Saunders,  1956)  may  result  in  problems  concerning  identification,  multi- 
collinearity.  and  interpretation  of  the  regression  weights  for  cross-product 
terms.  Identification  problems  pertain  to  the  need  to  include  more  predeter- 
mined variables  in  the  equations  when  polynomial  regression  or  interaction 
terms  increase  the  number  of  endogenous  variables.  Multicollinearity  concerns 
the  problem  where  highly  intercorrelated  variables  (e.g,,  a variable  and  a 
cross-product  term  in  which  the  variable  is  included)  lead  to  "bouncing  beta 
weights"  and  large  sampling  errors  for  the  estimates  of  the  structural 
parameters  (Darlington,  1968;  Werts  & Linn,  1970).  The  multicollinearity 
problem  is  not  limited  to  polynomial  regression  and  interaction  analysis;  it 
can  occur  with  any  highly  correlated  variables  that  enter  into  the  same 
equation.  Methods  for  alleviating  multicollinearity  include  deletion  of  vari- 
ables, formation  of  composites,  and  factor  analysis  (cf.  Goldberger,  1971; 
Johnston,  1972;  Joreskog,  1970).  Finally,  the  interpretation  of  regression 
weights  for  cross-product  terms  is  questionable  because  "regression  weights 
in  nonlinear  regression  equations  can  be  changed  by  changing  the  means  of  the 
independent  variables,  and  the  means  are  often  chosen  arbitrarily"  (Darlington 
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& Rom,  1972,  p.  453).  The  reader  is  referred  to  the  Darlington  and  Rom 
paper  for  possible  solutions  to  this  problem. 

In  practice,  the  assumption  concerning  the  lack  of  correlation  between 
the  exogenous  variables  and  the  disturbance  terms  in  the  limit  is  often 
violated,  as  noted  earlier  with  OLS.  The  amission  of  relevant  variables 
from  the  model  and  spuriousness  are  of  particular  concern.  It  is  generally 
impossible  to  assume  that  all  relevant  variables  are  known  and  included  in 
an  equation  for  a particular  endogenous  variable  (cf.  Duncan,  1975;  Heise, 
1975;  Kenny,  1975;  Spaeth,  1975).  The  costs  associated  with  omitting  rele- 
vant variables  are  a function  of  their  importance  in  the  system  and  of  the 
way  in  which  their  effects  are  transmitted  throughout  the  system  (Spaeth, 
1975).  The  effects  of  omitted  variables  might  include  a)  biased  estimates 
of  at  least  some  of  the  structural  parameters  included  in  the  model  and 
underestimation  of  the  dependent,  endogenous  variable  (Duncan,  1975;  Spaeth, 
1975);  b)  alternative  explanations  of  results  based  on  spurious  relationships 
between  measured  and  unmeasured  common  causes  (Kenny,  1975)  and,  as  noted 
above,  c)  correlations  among  predetermined  variables  and  the  disturbance 
terms,  as  well  as  correlations  among  the  disturbance  terms  for  reasons  other 
than  simultaneity  (cf.  Miller,  1971). 

In  general,  the  omitted  variable  specification  error  can  be  quite 
serious  because  it  implies  an  incomplete  theoretical  system.  On  the  other 
hand,  presently  unknown  omitted  variables  might  be  responsible  for  the  speci- 
fication error,  or  it  may  be  difficult  to  obtain  reliable  and  accurate 
measures  of  variables  of  hypothesized  theoretical  importance  (e.g.,  an 
ultimate  criterion).  In  practice,  therefore,  it  is  not  uncommon  to  allow 
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certain  trade-offs.  For  example,  exogenous  variables  with  low  rather  than 
zero  correlations  with  the  disturbance  term  in  the  limit  may  be  accepted 
(cf.  Fisher,  1971).  We  note  here,  however,  that  in  the  statistical  presen- 
tation  below  the  exogenous  variables  are  assumed  to  be  uncorrelated  with  the 
disturbance  terms  in  the  limit. 

Analytical  procedures.  The  presumed  correlations  between  the  endogenous 
variables  and  the  disturbance  terms  in  the  limit  (i.e.,  plim  [ (1/n)  Y d^]  i 0, 
where  £ f h ) result  in  inconsistencies  and  bias  if  OLS  is  used  to  estimate 
the  values  of  the  structural  parameters  in  nonrecursive  models.  The  2SI.S 
procedure  is  employed  to  obtain  estimates  of  the  endogenous  variables  that 
are  uncorrelated  with  the  disturbance  terms  in  the  limit  for  the  equations  in 
which  the  endogenous  variables  are  used  as  predictors.  These  estimates  are 
then  used  to  obtain  consistent  estimates  of  the  structural  parameters.  Thus, 
two  stages  of  estimation  are  required  to  estimate  the  structural  parameters. 

A discussion  of  the  general  algebraic  steps  involved  in  applying  2SLS  to 
nonrecursive  equations  is  presented  below.  An  outline  of  the  matrix  algebra 
steps  involved  in  this  procedure  and  a discussion  of  the  rank  condition 
required  for  identification  are  presented  in  Appendix  B, 

■ 

To  obtain  estimates  of  the  endogenous  variables  that  are  uncorrelated 
with  the  disturbance  terms,  a reduced  form  of  the  set  of  structural  equations 
is  constructed.  A reduced  form,  which  is  the  first-stage  of  2SLS,  consists 
of  a set  of  equations  in  which  each  endogenous  variable  is  represented  as  a 
function  of  only  the  exogenous  (predetermined)  variables  and  a disturbance 
term  (Duncan,  1975).  That  is,  each  endogenous  variable,  serves  as  dependent 


variable  for  one  equation,  and  the  independent  variables  for  each  of  these 
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equations  include  all  exogenous  (predetermined)  variables  from  the  system  of 
equations,  plus  a disturbance  term.  Ordinary  least  squares  is  then  applied 
to  each  of  the  reduced  form  equations  to  obtain  estimates  of  each  of  the 
endogenous  variables  (see  Appendix  A for  an  example  derivation  of  a reduced 
form) . 

For  example,  the  reduced  forms  for  the  model  in  Figure  3,  and  equations 
2 through  4,  are 
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In  equations  5 through  7,  the  variables  are  presumed  to  be  in  deviation  form, 

A A 

the  y,  represent  predicted  scores  for  the  endogenous  variables,  the  it  . repre- 
— L.  -ja£ 

sent  unbiased  estimates  of  population  reduced  form  parameters  (*^)  based  on 
OLS  in  a random  sample,  and  the  m represent  disturbance  terms  for  the  reduced 


form  equations. 


.A 


It  is  important  to  note  that  the  predicted  y (y  ) are  exact  functions 

~£‘  A 

of  the  x,  (exogenous  variables)  and  thus  the  correlations  between  the  y and 
K Q. 

the  disturbance  term  in  a particular  structural  equation  (i.e.,  equations  2 
through  4)  are  equal  to  zero  in  the  limit  (cf.  Johnston,  1972,  p.  383).  That 
is,  the  x^  are  uncorrelated  with  the  disturbance  terms  in  the  limit  and 


therefore  exact  functions  of  the  x.  will  also  be  uncorrelated  with  the  dis- 

k 

~ A 

turbance  terms  in  the  limit.  In  a sample  the  *^  are  estimates  of  the  *^ 


and  may  not  result  in  predicted  y that  have  zero  sample  correlations  with 
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the  respective  disturbance  terms,  although  divergences  from  zero  tend  to  be 
smaller  as  sample  sizes  increase  (Duncan,  1975). 

/\ 

Assuming  that  the  correlations  between  the  y and  d are  at  least 

— — a. 

asymptotically  equal  to  zero,  it  is  possible  to  proceed  to  the  second-stage 

of  the  2SLS  procedure,  which  involves  replacing  the  original  sample  values 
A 

of  the  y_  with  the  y in  equations  2 through  4 and  conducting  OLS.  The  new 
— — & 

equations  for  estimating  the  structural  parameters,  based  on  a random  sample, 


yl  " b12 

ll 

+ 

h3  ll 

+ 

C11 

^1 

+ 

c12  x2  + £l’ 

(8) 

^2  “^21 

A 

£l 

+ 

A 

hill 

+ 

C23 

X3 

+ 

V 

(9) 

v * b 
_1  _31 

A 

ll 

+ 

b32  y_2 

+ 

C34 

x4 

+ 

£3* 

(10) 

where  the  cR^  for  the  exogenous  variables  are  unchanged  with  respect 

to  equations  2 through  4,  but  the  for  the  endogenous  variables 

indicate  that  these  estimates  of  the  structural  parameters  are  based 

on  predicted  y rather  than  original  y . 

d. 

The  regression  weights  provided  by  equations  8 through  10  are  consistent 

estimators  of  the  population  structural  parameters  (B  ) , but  they  are  not 

— & 

generally  unbiased,  although  the  bias  tends  to  become  negligible  in  large 
samples  (cf.  Johnston,  1972;  Namboodiri  et  al.,  1975).  The  parameter  esti- 
mates for  a given  set  of  structural  equations  are  mathematically,  but  not 
necessarily  causally,  unique  if  the  equations  are  exactly  identified  or 
overidentified.  The  significance  of  the  estimated  structural  parameters  can 
be  tested  using  well-known  significance  tests  for  unstandardized  (partial) 
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regression  coefficients  (cf.  Kerlinger  & Pedhazur,  ]973).  The  null  hypothesis 
is  that  the  population  structural  parameter  does  not  differ  significantly  from 
zero  (i.e.,  has  no  causal  effect).  In  overidentif ied  models  it  will  be  noted, 
however,  that  one  or  more  variables  are  presumed  to  have  structural  parameters 
equal  to  zero  in  particular  equations.  As  shown  below,  this  provides  an  ave- 
nue for  assessing  the  goodness  of  fit  of  the  data  to  the  model,  where  in  fact 
the  estimates  of  certain  parameters  may  change  (thus  questioning  the  uniqueness 
in  a causal  sense  of  the  parameter  estimates  in  the  same,  overall  causal 
model) . 

Goodness  of  fit.  As  discussed  earlier,  the  goal  of  causal  analyses 
based  on  correlational  data  is  to  examine  the  logical  consistency  of  alterna- 
tive causal  hypotheses  and  to  reject  those  that  are  untenable.  However, 
because  of  untestable  assumptions  or  sets  of  structural  equations  that  fit  the 
data  equally  well,  a simple,  "correct"  set  of  structural  equations  cannot  be 
ascertained.  With  respect  to  the  present  example,  it  Is  quite  possible  that 
different  models  and  therefore  different  sets  of  structural  equations  could 
be  developed.  In  fact,  in  the  typical  case  a rather  considerable  number  of 
alternative  models  can  be  constructed  (cf.  Duncan,  1975). ^ For  example,  the 
reciprocal  relationship  between  and-vYj  could  be  replaced  with  a single 

arrow  from  Y-.  to  Yt.  Thus,  the  theoretical  assumptions  and  the  questions  of 
— ~ \ 

the  goodness  of  fit  of  a particular  set  of  structural  equations  to  a set  of 
data  become  salient.  A method  for  testing  the  goodness  of  fit  is  presented, 
which  can  be  conducted  only  with  overidentified  equations. 

The  test  of  goodness  of  fit  discussed  here  is  estimation  of  omitted 
parameters.  Other,  and  often  more  sophisticated  procedures  are  available 


J 
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(cf.  Costner  & Schoenberg,  1973;  Kalleberg  & Kluegel,  1975;  Joreekog,  1973; 
Namboodiri  et  al.,  1975),  but  they  generally  require  stronger  assumptions 
and  the  differences  among  the  methods  are  not  definitive  (Johnston,  1972; 
Namboodiri  et  al. , 1975;  King,  Note  1). 

In  the  example  above,  it  will  be  recalled  that  structural  equations  3 
and  4 were  overidentified . In  essence  this  means  that  In  these  equations 
certain  variables  were  assumed  to  have  population  structural  parameters 
equal  to  zero.  For  example,  in  equation  3,  the  population  structural  para- 
meters C2p  ^22*  and  C24  were  atsumed  to  be  zero.  A test  of  the  goodness  of 
fit  based  on  sample  data  would  be  to  ascertain  empirically  if  in  fact  at 
least  some  of  the  estimates  of  these  parameters  (i.e.,  c2i>  c22»  c24^  are 
equal  to  zero.  As  outlined  by  Namboodiri  et  al.  (1975),  the  parameters  in 
which  there  is  the  least  faith  of  a zero  value  are  Inserted  into  the  struc- 
tural equations  until  each  structural  equation  is  exactly  identified.  In 
the  example,  only  one  sample  structural  parameter  could  be  inserted  Into 
equations  3 and  4 to  achieve  exact  identification.  Once  the  overidentified 
equations  have  been  exactly  identified,  a 2SLS  analysis  is  conducted  on  the 
new  set  of  structural  equations.  Significance  tests  for  all  estimated  struc- 
tural parameters  are  then  conducted.  Particular  interest  is  attached  to 
(a)  whether  the  estimated  structural  parameters  which  the  causal  model 
specified  as  being  equal  to  zero  are  in  fact  "approximately"  zero  (i.e., 
within  the  realm  of  sampling  error),  and  (b)  any  meaningful  changes  in  the 
original  estimates  of  the  structural  parameters  when  compared  to  the  first 
2SLS  analysis  with  over identified  models. 
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The  correlations  between  the  exogenous  (predetermined)  variables  and 
the  disturbances  terms,  as  well  as  the  correlations  among  the  disturbance 
terms  from  different  equations,  may  also  be  checked  (see  Duncan,  1975  for 
computing  equations).  Although  the  latter  set  of  correlations  is  not  con- 
strained to  equal  zero,  large  correlations  would  bring  the  model  into 
question.  For  example,  a large  correlation  between  two  disturbance  terms 
could  Indicate  the  presence  of  omitted  variables  that  should  be  in  the  model, 
or  of  correlated,  nonrandom  errors. 

Summary 

Theoretical  and  mathematical  developments  in  this  section  are  explained, 
by  way  of  summary,  first  by  postulating  how  a nonrecursive  model  might  be 
applied  to  a salient  problem  in  social-organizational  psychology,  and  then  by 
reviewing  verbally  the  steps  involved  in  employing  2SLS  for  analysis  pur- 
poses. The  problem  selected  concerns  the  causal  determinants  of  leader  and 
subordinate  behaviors  in  formal  groups  (e.g.,  workgroups). 

A basic  assumption  underlying  much  of  the  leadership  research  has  been 
that  the  behavior  of  the  leader  toward  subordinates  is  a major  causal  factor 
in  respect  to  organizationally  related  attitudes  and  behaviors  of  subordinates 
(cf.  Gibb,  1969;  Kerr  & Schriesheim,  1974;  Likert,  1967;  Stogdill,  1974; 

Vroom,  1976).  A considerable  body  of  data  provide  support  for  this  assump- 
tion, which  reflects  an  asymmetric  or  recursive  model.  However,  an  increasing 
accumulation  of  research  has  indicated  that  leader  behavior  is  at  least  par- 
tially determined  by  the  behaviors  of  subordinates  and  by  leader-subordinate 
relationships,  and  further  that  a particular  leader  may  display  different 
(flexible)  behaviors  with  different  subordinates  in  the  same  workgroup 
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(cf.  Barrow,  1976;  Cummins,  1972;  Oansereau,  Graen,  & Haga,  1975;  Evans, 

1973;  Farris  & Lim,  1969;  Fiedler  & Chemers,  1974;  Green,  1973,  1975;  Hill  & 
Hughes,  1974;  House  & Mitchell,  1974;  Lowin  & Craig,  1968). 

The  latter  type  of  relationship  is  illustrated  by  studies  involving 
experimental  or  quasi-experimental  designs  (including  cross-lagged  pane] 
correlation)  in  which  high  or  increasing  subordinate  performance  levels 
caused  leaders  to  employ  more  supportive-consideration  oriented  behaviors, 
while  low  or  decreasing  subordinate  performance  levels  resulted  in  more  use 
of  structuring-authoritarian  behaviors  on  the  part  of  the  leader  (Barrow, 

1976;  Dansereau  et  al. , 1975;  Green,  1973,  1975).  Other  studies,  primarily 
of  a correlational  nature,  have  suggested  that  a number  of  subordinate  vari- 
ables might  affect  supervisory  behavior,  either  directly  or  as  moderators. 
These  include  (a)  job  knowledge,  (b)  satisfaction,  (c)  role  ambiguity  and 
role  conflict,  (d)  locus  of  control  (e.g.,  internals  were  more  satisfied 
with  considerate  leader  behaviors),  (e)  race  of  both  supervisor  and  subor- 
dinate, (f)  perceived  organizational  independence,  (g)  hierarchical  level  of 
subordinate  in  the  organization,  (h)  expectations  concerning  leader  behavior 
and  rewards,  (i)  the  acceptance  of  the  leader  by  subordinates,  (j)  needs  for 
structure  and  independence  (and  the  degree  of  congruency  between  leaders  and 
subordinates  for  these  needs),  (k)  complexity  of  subordinate  tasks,  and  (1) 
various  other  needs  such  as  needs  for  achievement  and  performing  meaningful 
tasks  (cf.  Herold,  1974;  House  & Mitchell,  1974;  Kerr  & Schriesheim,  1974; 
Lowin  & Craig,  1968;  Parker,  1976;  Steers,  1975;  Stogdill,  1974;  Vroom, 

1976).  There  are,  of  course,  a number  of  additional  contingencies,  such  as 
the  leader's  hierarchical  influence,  specificity  of  goals,  power  and  authorityt 
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organizational  incentives  and  feedback,  organizational  structure,  and  others 
that  might  well  enter  into  the  determination  of  leader-subordinate  relation- 
ships. 

This  is  not  an  exhaustive  review  and  it  is  realized  that  the  leadership 
process  Involves  many  contingencies.  Nevertheless,  there  is  ample  evidence 
to  postulate  that  the  causal  relationships  between  leader  and  subordinate 
behaviors  are  at  least  partially  symmetric  and  reciprocal  rather  than  asym- 
metric. Furthermore,  it  appears  that  the  effects  of  many  of  the  reciprocal 
interactions  between  leaders  and  subordinates  are  relatively  rapid  and  thus 
chat  meaningful  time  lags  are  essentially  unidentifiable.  Finally,  if  we 
assume  that  the  reciprocal  relationships  between  leader  and  subordinate 
behaviors  tend  to  stabilize  in  situations,  then  a nonrecursive  model  is 
appropriate  for  attempting  to  identify  the  causal  factors  for  both  leader  and 
subordinate  behaviors. 

The  following  steps  provide  a rough  outline  of  the  application  of  2SLS 
for  the  analysis  of  the  proposed  nonrecursive  relationships  between  leader 
and  subordinate  behaviors.  For  exemplary  purposes,  we  shall  assume  that 
(a)  the  data  are  cross-sectional  and  based  on  natural  observations;  (b)  the 
unit  of  analysis  is  the  subordinate,  where  the  managerial  strategies  (Oldham, 
1976)  employed  by  each  leader  for  each  subordinate  are  measured,  while  other 
data  describing  the  leader  are  duplicated  for  each  subordinate;  (c)  moderator 
analyses  would  be  conducted  using  the  subgrouping  technique  (cf.  Guion,  1976) 
(this  avoids  the  potential  problem  of  multicollinearity  if  cross-product 
terms  were  used,  although  it  would  require  the  construction  of  separate  and 
perhaps  different  structural  equations  for  each  subgroup);  (d)  the  structural 
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equations  are  Identified;  and  (e)  the  structural  equations  are  otherwise 
correctly  specified  (e.g.,  assumptions  regarding  linearity,  reliability, 
random  sampling,  Inclusion  of  all  relevant  causal  factors,  and  so  forth  have 
not  been  violated).  The  structural  parameters  for  each  dependent  variable 
for  which  a reciprocal  relationship  exists  could  then  be  estimated  by  the 
following  steps  (these  steps  follow  roughly  those  presented  by  Helse  [1975, 
p.  169]). 

1.  Design  a structural  equation  for  each  dependent  variable  that 
expresses  the  values  of  the  dependent  variable  as  a function  of  other  endo- 
genous variables  with  which  the  dependent  variable  has  reciprocal  relation- 
ships, predetermined  variables  (nonlagged  exogenous  In  this  case)  with  which 
there  Is  a direct  relationship,  and  a disturbance  term.  For  our  purposes 
here,  we  shall  assume  that  there  are  two  sets  of  endogenous  variables,  namely 
(a)  leader  managerial  strategies  (e.g.,  providing  rewards  and  punishments, 
setting  goals,  designing  feedback  systems,  etc.),  and  (b)  subordinate  behav- 
iors (e.g.,  job  performance  levels  on  different  behavioral  criteria,  reactions 
to  the  leader,  etc.).  Reciprocal  relationships  are  presumed  to  exist  between 
the  sets  of  endogenous  variables  for  at  least  one  variable  from  each  set, 
and,  if  appropriate,  for  selected  variables  within  each  set.^1 

It  is  also  assumed  that  there  are  three  sets  of  exogenous  variables, 
namely  (a)  variables  that  have  direct  effects  on  leader  behavior  and  indirect 
effects  (through  leader  behavior)  on  subordinate  behavior  (e.g.,  leader  intel- 
ligence, experience,  hierarchical  influence,  power,  authority,  etc.);  (b) 
variables  that  have  direct  effects  on  subordinate  behavior  and  indirect 
effects  on  leader  behavior  (e.g.,  subordinate  intelligence,  experience. 
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knowledge,  satisfaction,  motivation,  etc,);  and  (c)  variables  that  have 
direct  effects  on  both  leader  and  subordinate  behaviors  (e.g  , structure  of 
the  organization,  subsystem,  and  workgroup,  complexity  of  the  tasks,  con- 
gruency indices  between  leader  and  subordinate  needs,  expectations,  and  race, 
etc.).  Thus,  as  an  example,  the  structural  equation  for  a particular  subor- 
dinate behavior  (dependent  variable)  would  include  the  following  predictors 
(a)  leader  behaviors  which  have  a reciprocal  relationship  with  the  dependent 
variable,  (b)  other  subordinate  behaviors  which  have  a reciprocal  relation- 
ship with  the  dependent  variable,  and  (c)  exogenous  variables  that  have 
direct  effects  on  the  dependent  variable,  which  would  include  variables  that 
directly  affect  only  subordinate  behaviors  as  well  as  variables  that  directly 
affect  subordinate  behaviors  and  leader  behaviors. 

2.  Separate  from  the  system  of  all  equations  those  variables  which  are 
exogenous.  These  variables  cannot  include  a variable  with  which  the  recipro- 
cally related  endogenous  variables  (i.e.,  leader  and  subordinate  behaviors) 
have  a reciprocal  relationship. 

3.  Regress,  using  OLS,  each  of  the  reciprocally  related  endogenous 
variables  on  all  of  the  exogenous  variables  identified  in  step  2 to  obtain 
regression  equations  for  predicting  values  of  the  endogenous  variables 
(i.e.,  develop  a reduced  form  and  conduct  the  first-stage  regression).  Use 
the  first-stage  regression  equations  to  obtain  predicted  values  for  the 
reciprocally  related  endogenous  variables.  These  predicted  values  will  be 
purged,  in  the  limit,  of  their  correlations  with  the  disturbance  terms 
associated  with  the  structural  equations  constructed  in  step  1. 
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4.  Return  to  the  structural  equations  constructed  in  step  1 and 
estimate  the  structural  parameters  by  0LSf  substituting  the  predicted  values 
of  reciprocally  related  endogenous  variables  obtained  in  step  3 for  the  ori- 
ginal values  of  the  endogenous  variables  (i.e.,  conduct  the  second-stage 
regression) . 

At  this  point,  the  2SLS  procedure  and  hypotheses  regarding  parameter 
estimates  and  goodness  of  fit  can  be  addressed  following  procedures  discussed 
earlier. 

RANDOM  MEASUREMENT  ERROR 

Random  measurement  errors,  particularily  in  the  predetermined  variables, 
may  have  disturbing  effects  on  the  estimation  of  structural  parameters.  A 
general  treatment  of  the  effects  of  random  measurement  error  on  parameter 
estimation,  which  is  often  referred  to  in  econometrics  as  the  "error  in  vari- 
ables" problem,  is  described  below.  This  is  followed  by  an  introduction  to 
the  use  of  instrumental  variables  as  a solution  to  the  random  measurement 
error  problem,  and  a demonstration  of  the  relationships  between  instrumental 
variables  and  2SLS.  It  will  also  be  noted  that  2SLS  is  the  more  general 
procedure  because  it  can  be  used  with  overidentified  equations. 

From  a general  standpoint,  a bivariate  relationship  between  two  vari- 
ables in  a random  sample  may  be  displayed  in  deviation  form  as 

2.  “ ]>  x + d (11) 

where  b Is  an  estimate  of  population  structural  parameter  B,  x is  a 
random  variable  which  takes  on  values  from  a distribution  of  true 
scores  randomly  sampled  from  the  population,  and  <1  represents  the 


disturbance  term. 
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For  purposes  of  unbiased  and  consistent  estimation  using  OLS,  it  Is  assumed 
that  the  explanatory  variable  (x)  is  uncorrelated  with  the  disturbance  term 
in  the  limit  [i.e.,  plim  f(l/n)  X d ■ 01.  However,  it  can  be  shown  rather 
easily  that  this  assumption  is  incorrect  if  x involves  a random  error  com- 
ponent (only  random  measurement  error  is  addressed  here).  That  is,  if  the 
observed  x is  equal  to  _t  + e^,  where  £ equals  the  true  score  on  the  variable 
and  e is  a random  measurement  error,  then  equation  11  becomes 
j s b x + (d  - b e ) (12) 

which  was  obtained  by  replacing  x in  equation  11  with  x - e (The  x 
in  equation  11  assumes  no  measurement  error,  or  conversely,  a true 
score.  Thus,  if  the  observed  x is  measured  with  error,  the  term  in 
equation  11  should  be  x - e [=t]). 

In  equation  12,  x is  correlated  with  the  disturbance  term  (d_  - b^  e)  in 
the  limit  because  x is  a function  of  e (cf.  Blalock  et  al. , 1970;  Bohmstedt, 
1969;  Christ,  1966;  Goldberger,  1971;  Johnston,  1972;  Theil,  1971;  Wiley  & 
Wiley,  1971).  Thus,  the  use  of  OLS  to  estimate  15  from  b will  be  both  biased 
and  inconsistent  (random  measurement  error  in  £ is  considered  a part  of  the 
<1  term  and  does  not  affect  bias  or  consistency  of  parameter  estimation).  In 
the  bivariate  case,  the  bias  is  in  the  direction  of  attenuating  (underesti- 
mating) the  estimate  of  B.  However,  in  the  multivariate  case  with  several 
explanatory  variables,  each  with  different  random  measurement  errors,  the 
bias  in  the  estimates  of  the  structural  parameters  may  be  postive  or  negative. 
For  example,  as  discussed  by  Blalock  et  al.  (1970),  it  is  possible,  (a)  to 
infer  that  a relationship  between  two  variables  is  partly  spurious  when  in 
fact  it  is  totally  spurious,  (b)  to  treat  an  additive  model  as  if  a 
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statistical  Interaction  existed,  and  (c)  to  obtain  Incorrect  estimates  of 

structural  parameters,  including  a sign  reversal  (cf.  Kenny,  1975),  parti- 
cularly when  the  explanatory  variables  are  correlated  and  have  differing 
degrees  of  random  measurement  error, 

A number  of  authors  have  proposed  approaches  for  dealing  with  random 
measurement  error  in  variables,  especially  if  the  observed  measures  are 
considered  as  fallible  "indicators"  of  unobserved  constructs  (e.g.,  true 
scores)  (cf.  Blalock,  1969,  1970;  Blalock  et  al.,  1970;  Bohrnstedt,  1969; 
Costner,  1969;  Duncan,  1975;  Goldberger,  1971;  Goldberger  & Duncan,  1973; 
Hauser  & Goldberger,  1971;  Heise,  1975;  Johnston,  1972;  Joreskog,  1973; 
Kalleberg  & Kluegel,  1975;  Kenny,  1975;  Namboodiri  et  al.,  1975;  Pindyck  & 
Rubinfeld,  1976;  Werts  & Linn,  1970;  Werts,  Linn,  & Joreskog,  1971;  Wiley, 
1973;  Wiley  & Wiley,  1971;  Wold,  1975),  We  shall  focus  here  only  on  obser- 
vables, and  employ  a popular  approach  known  in  econometrics  as  "instrumental 
variables".  The  relationship  between  instrumental  variables  and  2SLS  will 
be  demonstrated.  It  should  be  noted,  however,  that  the  "multiple  indicator" 
procedures,  which  are  not  addressed  here,  have  a strong  tie  to  known  methods 
in  psychology  (e.g.,  confirmatory  factor  analysis,  multitrait-multimethod 
matrix)  and  thus  offer  another  attractive  alternative  to  the  analysis  of 
variables  with  random  measurement  error  and  to  the  analysis  of  unobservables. 

The  discussion  of  the  instrumental  variables  approach  and  its  relation- 
ship to  2SLS  generally  follows  presentations  by  Heise  (1975),  Johnston  (1972), 
and  Pindyck  and  Rubinfeld  (1976).  Beginning  with  equation  12,  where  x is 
correlated  with  the  disturbance  term,  the  instrumental  variables  approach 
proceeds  by  attempting  to  find  a variable  "z"  such  that  (cf.  Heise,  1975) 
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(1)  plim  [ (i/ri)  Z (d  — b £)]  " 0;  which  connotes  that  z is  not 
causally  connected  to  factors  which  affect  but  have  been 
omitted  from  the  equation,  nor  is  z related  to  the  random 
measurement  error  in  x. 

(2)  z is  a cause  for  x,  preferably  a direct  cause  although  an 
indirect  cause  through  intervening  variables  is  acceptable 
as  long  as  the  intervening  variables  are  not  causally  rela- 
ted to  In  addition,  z itself  cannot  affect  % directly. 

(If  ii  is  causally  related  to  in  any  way  other  than  through 
x,  then  it  is  possible  for  z to  be  causally  related  to  omit- 
ted causes  for  ^ and  thus  create  a specification  error,  such 
as  violating  the  preceeding  assumption) . 

(3)  z is  not  affected  causally  by  £ or  x. 

(4)  £ may  have  random  measurement  error  as  long  as  such  error  is 
not  correlated  with  the  disturbance  term  in  equation  12  (a 
highly  reliable  z_  is  of  course  preferable). 

Given  these  conditions,  with  accompanying  assumptions  of  linearity, 
additivity,  essentially  interval  measurement,  random  sampling,  and 
E (d  - b^  ^)  “ 0,  it  is  possible  to  use  z to  obtain  a consistent  estimate  of 
£ by  the  following  equation 

E 21 

(sum  is  from  1 . . . .n)  (13) 

I X.  £ 

2 

which  replaces  the  usual  OLS  estimation:  b = (E  x j)  / (lx  );  and 

where  z is  used  as  an  instrument  for  £. 

In  the  above  equation,  1)  will  be  a consistent  estimator  of  B because 
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plim  _b  « B + E £ (d  - <; ) 


where  the  numerator  for  the  second  term  on  the  right-side  of  the 
equation  approaches  0 in  the  limit  (see  assumption  i> 

The  rationale  for  equation  13  can  be  seen  more  clearly  if  the  intuitive 

and  statistical  relationships  between  instrumental  variables  and  2SLS  is 

demonstrated.  Ir  essence,  equation  13  is  represented  by  the  model 

z — ^x — (Seise,  1975),  where  x is  measured  with  error.  Assuming  that  £ 

meets  the  criteria  for  an  instrumental  variable,  the  first-stage  of  2SLS 

A 

consists  of  replacing  the  fallible  measure  (x)  with  an  estimate  (x)  that  is 

not  correlated  with  the  disturbance  term  (d_  - e)  in  the  limit.  In  this 

context,  the  first-stage  of  2SLS  involves  the  creation  of  an  instrument 

(Pindyck  & Rubinfeld,  1976).  The  second-stage  of  2SLS  then  involves  the  use 

of  the  created  instrument  (x)  in  place  of  the  fallible  measure  to  obtain  a 

consistent  estimate  of  the  structural  parameter.  For  example,  the  first- 

A 

stage  of  2SLS  consists  of  regressing  x on  £ and  obtaining  an  estimated  x. 

The  regression  equation  is 
A 

x •=  a £ + m 

A 

where  the  predicted  score  (instrument)  for  x (x ) is  equal  to  a z, 
which  by  definition  is  not  correlated  with  ((3  - b e)  in  the  limit. 

The  second-stage  of  2SLS  then  consists  of  replacing  x with  x in  equation  12, 
and  conducting  OLS,  although  a more  direct  comparison  of  2SLS  and  instru- 
mental variables  is  shown  by  the  following  simple  derivation 

For  the  usual  OLS  calculation  of  b,  (T.  £ x)  / (Ix^) , the  x in  the 

d A 

numerator  is  replace  by  x,  and  one  x in  the  denominator  is  replaced 
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by  x (replacing  both  x s in  the  denominator  with  x results  in  an 

inconsistency  [cf.  Christ,  1966;  Pindyck  & Rubinfeld,  1976]).  We 


thus  have 


Li  (a  z) 
Lx  (a  z) 


where  the  last  term  is  the  same  as  the  instrumental  variable  esti- 
mator presented  in  equation  13. 

In  this  example,  2SLS  is  equivalent  to  instrumental  variables.  This 
will  not  always  be  the  case;  the  instrumental  variables  approach  typically 
focuses  on  the  use  of  only  one  instrument  for  each  fallible  variable.  Where 
mere  than  one  instrument  exists  for  a fallible  variable,  each  instrument 
provides  a separate  estimate  for  the  structure  parameter  (e.g.,  equation  13 
is  replicated  for  each  instrument),  and  the  problem  is  then  to  decide  which 
estimate  to  accept,  or  how  to  combine  the  separate  estimates  to  arrive  at 
one  estimate  (cf.  Goldberger,  1971,  1973).  On  the  other  hand,  2SLS  automa- 
tically accommodates  multiple  instruments  for  each  fallible  variable  because 
a least  squares  weighted  combination  of  multiple  instruments  (z^)  can  be 
employed  to  create  a new  instrument  (x)  for  each  fallible  variable  in  the 
first-stage  regression.  In  other  words,  the  first-stage  of  2SLS  provides 
the  basis  for  developing  a weighted  linear  combination  of  the  original  instru- 
ments in  the  creation  of  a new  instrument. 

In  more  general  terms,  2SLS  and  the  instrumental  variables  approach 
will  provide  unique  and  identical  parameter  estimates  in  an  exactly  identi- 
fied system  of  simultaneous  equations  if  all  predetermined  variables  are 


used  in  the  first-stage  of  2SLS  and  the  instruments  (analogous  to  x)  used  in 
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the  second-stage  of  2SLS  consist  of  the  predicted  values  from  the  first-stage 
(reduced  form)  regressions  (Pindyck  & Rubinfeld,  1976).  In  overidentified 
structural  equations,  however,  2SLS  again  provides  unique  parameter  estimates 
while  the  instrumental  variables  approach  does  not.  Furthermore,  as  noted 
by  Goldberger  (1973,  p.  151),  2SLS  "is  as  efficient  as  any  other  instrumental 
variable  estimator  in  the  present  context",  which  referred  to  the  weighting 
and  combination  of  instruments  in  overidentified  structural  equations. 

The  application  of  2SLS  to  simultaneous  equations  where  some  variables 
involve  random  measurement  errors  is  summarized  below,  and  follows  the  pre- 
sentation by  Johnston  (1972).  To  simplify  matters,  it  was  assumed  that  the 
equations  were  based  on  a recursive  model,  although  nonrecursive  or  block- 
recursive  models  could  also  be  treated  in  this  general  paradigm  (i.e,,  both 
nonrecursiveness  and  random  measurement  error  would  have  to  be  addressed  as 
reasons  for  correlations  among  explanatory  variables  and  disturbance  terms). 
Matrix  algebra  was  employed  to  conserve  space  (the  reader  may  wish  to  con- 
sult Appendix  B before  proceeding  with  this  section). 

To  begin,  one  equation  from  the  system  of  simultaneous  equations  is 
represented  by 


y = 


YlJ^ 

■w 


+ X,  c + 


<8- 


(14) 


where  y in  an  n x 1 vector  of  observations  (raw  scores)  for  an  endo- 
genous  variable, 

Y^  is  an  n x & matrix  of  observations  on  variables  which  include 

random  measurement  errors  and  are  correlated  with  the  disturbance 
term  (Y,  does  not  include  y) , 
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^b^is  a £ x 1 vector  of  structural  parameter  estimators  attached  to 

the  variables, 

-W 

X^  is  an  n x k matrix  of  observations  on  variables  appearing  in  this 

-W 

equation  which  are  not  correlated  with  the  disturbance  term, 

c^is  a k x 1 vector  of  structural  parameter  estimators  attached  to 

the  variables,  and 

■-w 

u is  an  n x 1 vector  of  disturbances,  which  can  also  be  written  as 

/v^  — 

(d  - E b),  where  d is  the  vector  of  original  disturbance  terms 

/w  w -w 

and  E b,  analogous  to  be  equation  12,  represents  the  effects  of 

random  measurement  errors  in  the  variables. 

If  it  is  presumed  that  the  equations  are  identified  and  all  assumptions 
for  2SLS  have  been  met,  with  the  exception  of  the  random  measurement  errors 
in  the  Y^  variables,  then  the  specification  error  for  the  above  equation  is 
pUm  (£  x'  u)  4 0 

which  connotes  that  the  variables  in  Y^  are  correlated  with  the  dis- 

■v^ 

turbance  term  in  the  limit,  and  further  that  the  estimates  b of  the 

structural  parameters  will  be  biased  and  inconsistent. 

It  is  important  to  reiterate  that  the  variables  in  X are  not  correlated 

■yjt/ 

with  u in  the  limit.  In  addition,  there  will  exist  a set  of  variables  in  the 
remainder  of  the  simultaneous  equations  that  are  also  not  correlated  with  the 
disturbance  terms.  These  variables  do  not  appear  in  X^,  and  will  be  considered 
as  comprising  a matrix  X^>  (Although  we  are  not  dealing  with  a nonrecursive 
model  here,  perhaps  an  analogy  to  such  a model  would  be  of  assistence.  That 
is,  the  Y^  variables  are  analogous  to  the  interdependent  endogenous  variables 
in  nonrecursive  models  in  the  sense  that  they  are  correlated  with  the 
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disturbance  terms.  The  and  variables  are  analogous  to  predetermined 
variables  In  the  sense  that  they  are  not  correlated  with  the  disturbance 
terms) . 

The  application  of  2SLS  to  the  above  situation  is  designed  to  develop 
a set  of  instruments  for  the  Y^  variables  in  the  first-stage  regression 
which  are  purged  of  their  correlations  with  the  disturbance  term,  and  then 
to  use  these  instruments  in  place  of  in  the  second-stage  regression  in 
order  to  obtain  consistent  estimates  of  the  structural  parameters.  The 
reduced  form  employed  in  the  first-stage  regression  for  the  estimation  of 
the  Y^  variables  is  based  on  both  the  Xj  and  X2  variables,  which  by  defini- 


Vv 


tion  are  not  correlated  with  the  disturbance  term.  Thus,  estimates  of  Y^ 

A 

(i.e.,  Y^),  which  are  direct  linear  functions  of  X^  and  X2,  will  also  not 


be  correlated  with  the  disturbance  term.  The  reduced  form,  first-stage 


regression  is  therefore  (Johnston,  1972,  p.  381): 

'I  / _i  * 

Y,  ■ X (X  X)  1 X Y, 

JL  ■v*'  <w  <vv  -v-  X 


(15) 


where  X 


[xx  x2]. 

/va/ 


The  second-stage  regression  proceeds  by  first  noting  that  Y]_  =.  Y^  - V , 
where  is  an  n x £ matrix  of  the  residuals  obtained  from  regressing  on 
X.  Second,  in  matrix  terminology  the  instruments  u6ed  in  the  second-stage 
regression  are  comprised  by  the  matrix  [Y^  - X^] , which  will  be  employed 

in  place  of  [Y^  X^] , the  original  observation  matrix  (the  reader  will  note 


that  Xj  does  not  change  in  the  above  two  matrices  and  that  Y^  - provides 

a computing  method  for  Y^  which  precludes  the  need  to  actually  determine  the 
A ~~ 

values  for  Y^  [Johnston,  1972,  pp.  382  and  390],  Given  these  conditions, 
the  estimating  equations  for  the  second-stage  of  2SLS  are: 


w 


Two-Stage  Least  Squares 


39 

(Y1  ' v'z-  <16> 


The  above  computing  equations  will  provide  consistent  estimates  of  the 
structural  parameters  as  long  as 

1 * i 1 

plim  [_  (Y.  -V,)  u’]  and  plim  (_  X u)  ■=  0 (cf.  Johnston,  1972; 

A ~~  n 

Pindyck  & Rubinfeld,  1976). 

In  summary,  an  outline  of  the  use  of  instrumental  variables  and  2SLS  in 
situations  where  some  explanatory  variables  are  correlated  with  the  distur- 
bance term  for  reasons  of  random  measurement  error  has  been  presented.  Space 
limitations  preclude  a thorough  discussion  of  specification  errors  which  are 
salient  for  the  random  measurement  error  problem;  however,  we  shall  briefly 
mention  some  of  the  more  important  of  these  errors  (it  should  also  be  noted 
that  all  assumptions  regarding  the  use  of  2SLS  are  operational) . 

Of  initial  concern  are  the  criteria  mentioned  earlier  for  the  selection 
of  instrumental  variables.  For  example,  as  discussed  by  Fisher  (1971)  and 
Blalock  et  al.  (1970),  it  is  often  difficult  to  obtain  instrumental  vari- 
ables which  are  uncorrelated  in  the  limit  with  the  disturbance  term  while  at 
the  same  time  being  major  and  direct  causes  for  the  fallible  variables. 
Moreover,  the  use  of  multiple  instruments  for  one  variable  (i.e.,  in  over- 
identified equations)  may  result  in  the  problem  of  multicollinearity . In 
the  former  case,  it  may  be  necessary  to  use  instrumental  variables  which  have 
low  rather  than  zero  correlations  with  the  disturbance  term  in  the  limit, 
and/or  are  indirect  causes  for  the  explanatory  variables  comprised  partially 
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of  random  measurement  errors.  In  the  latter  case,  some  Instrumental  vari- 
ables may  have  to  be  deleted  or  a principal  components  analysis  conducted 
on  the  instrumental  variables  in  order  to  obtain  independent  predictors 
(Amemiya,  1966;  Johnston,  1972). 

Another  set  of  concerns  pertains  to  the  decision  of  whether  to  use  instru- 
mental variables  and/or  2SLS  versus  OLS  when  the  criteria  for  instrumental 
variables  have  not  been  fully  met.  For  example,  Blalock  et  al.  (1970) 
demonstrated  that  the  use  of  an  instrumental  variables  approach  (and  2SLS) 
may  be  inferior  to  OLS  if  an  instrumental  variable  is  related  to  the  dependent 
variable  either  directly  or  indirectly.  Finally,  as  noted  earlier  the  multi- 
ple indicator  approaches,  which  focus  on  unobservables  (e.g,,  confirmatory 
factor  analysis),  provide  another  avenue  for  addressing  variables  with  random 
measurement  errors,  and  perhaps  a combination  of  the  procedures  discussed 
here  and  the  multiple  indicator  approaches  might  well  provide  the  most  viable 
methods  of  analysis  for  the  random  measurement  error  problem. 

In  conclusion,  it  must  be  stressed  that  the  procedures  described  in  this 
section  do  not  provide  a solution  when  unreliable  variables  are  used.  For 
example,  one  could  question  the  efficacy  of  the  predicted  scores  following 
the  first-stage  regression  if  the  variables  to  be  replaced  were  highly  unre- 
liable to  begin  with.  Rather,  it  is  presumed  that  the  variables  to  be 
replaced  are  at  least  moderately  reliable.  Furthermore,  it  is  also  question- 
able that  Instruments  should  be  developed  for  variables  that  are  highly,  but 
not  perfectly,  reliable  (cf.  Blalock  et  al.,  1970).  That  is,  the  specifica- 
tion errors  associated  with  developing  instrumental  variables  (e.g.,  multi- 
collinearity)  might  have  more  serious  contaminating  effects  on  parameter 
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estimation  than  simply  proceeding  with  highly  reliable  variables  and 
recognizing  that  some  bias  and  inconsistency  might  be  present  in  estimation. 

In  fact,  Duncan  (1975)  has  noted  that  highly  reliable  variables  encompassing 
small  amounts  of  random  measurement  errors  would  not  be  likely  to  provide 
undue  strain  on  structural  models. 

LAGGED  ENDOGENOUS  VARIABLES 

The  present  application  of  2SLS  addresses  the  question  of  dynamic 
analysis,  namely  the  Inclusion  of  lagged  explanatory  variables  in  structural 
equations.  As  noted  earlier,  both  endogenous  and  exogenous  variables  may  be 
lagged.  Furthermore,  the  model  may  be  recursive  or  nonrecursive,  or  a com- 
bination of  both  recursive  and  nonrecursive  (e.g.,  block-recursive).  In  the 
application  of  2SLS  selected  for  discussion  here,  the  rather  thorny  problem 
of  including  lagged  endogenous  variables  in  nonrecursive  structural  equations 
is  presented.  This  application  provided  an  opportunity  to  demonstrate  how 
two  applications  of  2SLS  might  be  combined  (i.e,,  nonrecursive  models  and 
lagged  endogenous  variables);  however,  the  application  is  not  exhaustive  of 
the  applications  of  2SLS  for  structural  models  with  lagged  variables.  For 
example,  Johnston  (1972,  pp.  318-320)  has  presented  a procedure  which  includes 
a version  of  2SLS  for  analyzing  recursive  models  with  lagged  endogenous  vari- 
ables. 

Another  reason  for  selecting  this  application  of  2SLS  was  its  implications 
for  a currently  popular  procedure  employed  for  causal  analysis  in  psychology, 
namely  the  cross-lagged  panel  correlation  design  (Campbell,  1963;  Cook  4 Camp- 
bell, 1976;  Campbell  & Stanley,  1963;  Feldman,  1975;  Kenny,  1973,  1975).  In 
fact,  it  is  hoped  that  this  discussion  will  serve  to  encourage  psychologists 
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to  begin  to  think  of  cross-lagged  panel  correlation  designs  in  terms  of  their 
place  in  a more  holistic  theoretical  system  as  well  as  in  terms  of  competing 
causal  hypotheses  (e.g.,  nonrecursive  rather  than  recursive  models).  With 
this  interest  in  mind,  the  application  of  2SLS  for  nonrecursive  structural 
equations  which  include  lagged  values  of  one  or  more  endogenous  variables  was 
addressed  by  formulating  the  analysis  of  the  cross-lagged  panel  correlation 
design  in  terms  of  structural  equations. 

The  reader  is  referred  to  Kenny  (1975)  for  a review  of  the  cross-lagged 
panel  correlation  (XLPC)  design.  We  shall  focus  here  on  a brief  comparison 
of  the  goals  of  XLPC  and  structural  equations,  and  then  proceed  to  the  appli- 
cation of  structural  equations  to  the  XLPC  design.  As  shown  in  Figure  A, 
the  XLPC  design  involves  two  dependent  or  endogenous  variables  measured  at 
the  same  time  (Y^t  and  Yjj.,  where  t^  represents  observations),  and  two  lagged 
values  of  the  endogenous  variables,  both  measured  at  time  t-1  (Y^t_^  «nd 
_^).  The  latter  variables  are  considered  predetermined  in  the  present 
context.  The  XLPC  analysis  is  a test  for  spuriousness,  the  null  hypotheses 
being  that  the  relationship  between  Y2t  and  Y^t_^  (for  example)  is  due  to  the 
effects  of  one  or  more  other  variables  rather  than  causal  relationships  between 
the  two  variables  (Kenny,  1975),  However,  failure  to  reject  the  null  hypo- 

i 

thesis  is  not  sufficient  to  conclude  that  the  relationship  was  in  fact 
spurious  (cf.  Kenny,  1975). 


i 


Insert  Figure  A about  here 
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As  noted  by  Kenny,  the  XLPC  design  is  intermediary,  in  terms  of  causal 
explanation,  between  purely  correlational  designs  and  well-elaborated  struc- 
tural models.  Kenny  further  reported  that  although  XLPC  and  structural 
models  have  been  contrasted,  the  two  models  address  different  objectives  and 
make  somewhat  different  assumptions.  For  example,  the  XLPC  design  is  a test 
for  spuriousness,  does  not  require  that  all  causal  variables  be  included  in 
the  model,  and  does  accommodate  measurement  error.  Structural  equations,  on 
the  other  hand,  focus  on  the  estimation  of  causal  parameters,  and,  as  noted 
earlier,  omitted  causal  variables  and  measurement  error  result  in  specifica- 
tion errors.  Thus,  structural  equations  are  typically  more  demanding,  both 
in  terms  of  theory  and  psychometric/statistical  criteria.  Kenny  also  con- 
cluded that  XLPC  designs  were  more  applicable  to  social  science  data,  given 
the  present  state  of  theoretical  systems  and  the  pragmatics  of  measurement. 
While  we  agree  with  this  conclusion,  we  also  feel  that  a great  deal  is  to  be 
gained  by  thinking  in  terms  of  more  complete  theoretical  systems,  identifying 
sources  of  spuriousness  (i.e.,  omitted  variables),  and  improving  upon 
measurement  techniques.  For  these  reasons,  the  application  of  structural 
equations  to  XLPC  designs  was  addressed,  with  the  presumption  that  such 
applications  represent  a desirable  goal  for  psychology. 

A number  of  authors  have  proposed  methods  for  transforming  the  XLPC 
design  into  structural  (or  path)  equations  (cf.  Duncan,  1969;  Bohrnstedt, 

1969;  Goldberger,  1971;  Helse,  1970;  Pelz  & Lew,  1970).  These  equations 
typically  involve  lagged  endogenous  variables,  including  (a)  lagged  values  of 
the  dependent  variable  (e-g-t  Ylt_  2 is  viewed  as  a cause  for  Yjf)  and  (b)  what 
will  be  referred  to  here  as  "cross-lagged  endogenous  variables"  (e.g.,  Y^ 
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Is  viewed  as  a cause  for  Y ).  Furthermore,  as  discussed  later  it  can 

2t 

frequently  be  assumed  that  serial  correlation  exists  among  the  disturbance 
terms  for  structural  equations  representing  an  endogenous  variable  measured 
at  different  points  in  time.  Such  conditions  require  special  and  somewhat 
complex  statistical  procedures  for  solution,  including  several  modified  ver- 
sions of  2SLS  (cf.  Amemiya,  1966;  Fair,  1970;  Fisher,  1971;  Johnston,  1972; 
Miller,  1971;  Nerlove,  1971;  Pindyck  & Rubinfeld,  1976).  The  procedure 
presented  by  Fair  (1970)  was  recommended  by  Pindyck  & Rubinfeld  (1976)  as  an 
optimal  solution  for  structural  equations  involving  lagged  endogenous  vari- 
ables, and  was  used  as  basis  for  this  presentation. 

In  constructing  the  structural  equations  for  the  XLPC  design  and  lagged 
endogenous  variables,  the  possibility  of  a nonrecursive  relationship  between 
the  dependent  variables  was  added  to  the  model  presented  in  Figure  4 (i.e., 
a reciprocal  Interaction  between  Yjt  and  T^).  Although  XLPC  designs  typically 
rule  out  the  possibility  of  nonrecursive  causal  relationships  between  the 
dependent  variables  "by  fiat"  (Cook  & Campbell,  1976),  their  inclusion  in  the 
model  provided  a more  general  discussion  of  lagged  endogenous  variables  while 
at  the  same  time  attending  to  the  concerns  of  several  authors  that  such  rela- 
tionships may  be  meaningful,  competing  hypotheses  for  XLPC  (Duncan,  1969; 
Goldberger,  1971;  Helse,  1970).  The  structural  equations  for  the  XLPC  design 
were  therefore 


ylt 

b12 

y2t 

+ C13  y2t-l 

* CH  ylt-l 

+ dlt 

(17) 

y2t 

“ b21 

yit 

+ C24  ylt-l 

* C25  y2t-l 

* d2t 

(18) 

where  all  variables  are  presented  In  deviation  form. 
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In  each  equation,  the  endogenous  variable  is  seen  as  a function  of 
(a)  the  other  endogenous  variable  measured  at  both  time  t^  (a  nonrecursive 
relationship)  and  time  t-1  (a  cross-lagged,  first-order  autoregressive  rela- 
tionship), and  (b)  a lagged  value  of  the  endogenous  variable  (a  lagged, 

first-order  autoregressive  relationship).  All  of  the  variables  with  c , 

_£k 

regression  coefficients  on  the  right-side  of  equations  17  and  18  are  regar- 
ded as  predetermined.  For  the  present  purposes,  the  following  assumptions 
were  made:  (1)  linearity  and  additivity,  (2)  essentially  interval  measure- 

ment, (3)  no  measurement  errors,  (4)  E (dgt)  ■ 0,  and  (5)  random  sampling. 

In  addition,  it  was  assumed  that:  (6)  the  model  followed  a first-order 

autoregressive  scheme  with  discrete  time  lags  (this  allowed  the  use  of  dif- 
ference rather  than  differential  equations),  (7)  the  variables  were  measured 
at  the  same  points  in  time  (i.e.,  synchronicity  [cf,  Kenny,  1975]),  (8)  the 
measurement  intervals  corresponded  to  the  causal  intervals,  and  (9)  the 
structural  relationships  were  invariant  with  respect  to  time  (i.e,,  station- 
arity  [cf.  Kenny,  1975;  Pindyck  & Rubinfeld,  1976]). 

As  with  earlier  applications,  a number  of  the  above  assumptions  are 
difficult  to  meet  in  research.  Problems  associated  with  the  first  four 
assumptions  have  been  discussed,  while  problems  regarding  the  latter  set  of 
assumptions  are  discussed  in  a number  of  publications  cited  earlier  which 
deal  with  XLPC  designs,  autocorrelation,  and/or  time-series  analyses.  We 
shall  focus  here  on  a third  set  of  assumptions  which  intrinsically  cause 
estimates  of  the  structural  parameters  in  structural  equations  17  and  18  to 
be  inconsistent.  In  general,  these  assumptions  may  be  categorized  as  follows: 
(a)  the  disturbance  terms  for  each  dependent  variable  are  likely  to  be 
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serially  correlated  over  tine  (the  disturbance  terms  d,  and  d„  are  also 

_lt  2t 

likely  to  be  correlated  because  of  the  nonrecursive  relationship  and  because 
of  serial  correlation  among  the  disturbance  terms),  and  (b)  all  predetermined 
variables  are  likely  to  be  correlated  with  one  or  both  disturbance  terms. 

Each  of  these  assumptions  is  addressed  below.  It  should  also  be  noted  that 
equations  17  and  18  are  underidentified. 

10.  The  current  values  of  the  endogenous  variables  are  correlated  in 
the  limit  with  the  disturbance  terms  of  the  equations  in  which  they  are  used 
a?  predictors. 

That  is: 

plim  d Y2t  dlt)  4 0,  and  plim  (i  Y^  d^)  4 0 
This  Is  due  to  the  nonrecursive  nature  of  the  design. 

11.  The  disturbance  terms  for  the  current  and  lagged  endogenous  variables 
are  most  likely  serially  correlated.  This  problem  can  be  visualized  by  means 
of  Figure  5,  which  employed  Heise  (1970)  as  a base.  Y^t_2  an£*  Y2t-2  rePresent 
an  assumed,  but  not  actually  measured,  additional  wave  of  data.  Potentially 
estimable  relationships  given  two  waves  of  data  are  depicted  by  solid  lines, 
although  because  equations  17  and  18  are  underidentif led , no  causal  relation- 
ships could  be  estimated  until  additional  predetermined  variables  are  added. 
Dashed  lines  delineate  Implied  relationships,  and  the  curved  line  between 
Y^t_^  and  denotes  that  these  variables  are  being  treated  as  predeter- 

mined in  equations  17  and  18  (with  two  waves  of  data,  the  relationship 
between  these  variables  is  estimable  but  not  causal). 


Insert  Figure  5 about  here 
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As  an  example,  if  a stable,  omitted  variable  exists  which  is  a cause 
for  Y^t,  ^t_i»  ^lt-2’  t*'en  disturbance  terms  d^,  djt ^ will 

be  serially  correlated  (i.e.,  the  stable,  omitted  variable,  which  is  part 
of  the  disturbance  terms,  correlates  with  itself  over  time).  This  problem 
can  be  avoided  by  Including  all  relevant  causal  variables  in  the  equations 
so  that  the  disturbance  terms  reflect  only  random  and  unstable  influences. 
Because  equations  17  and  18  include  only  a few  variables,  it  is  likely  that 
stable,  causal  variables  have  been  omitted,  thus  resulting  in  a serial  cor- 
relation among  the  disturbance  terms. 

The  correlations  among  the  disturbance  terms  for  times  t_  and  t-1  may  be 
represented  as 

fit  ” dlt-l  + fit  and  f2t  " f22  d2t-l  + c2t 

where  p and  p0„  represent  first-order  serial  correlation  coefficients 

_11  -ii  I | 

which  vary  between  1 and  -1  (if  £ is  greater  than  +1  , the  system 

I I 

explodes),  and  e^t  (and  £2^  a rar>dom  error  component,  distributed 
2 

N (0,  ),  and  is  independent  of  other  disturbances  for  (Y2t^ 

measured  at  different  points  in  time,  including  d^t  (d2t)  (cf.  Pindyck 

& Rubinfeld,  1976). 

Although  not  discussed  here,  tests  for  serial  correlation  of  disturbances  in 
the  presence  of  lagged  endogenous  variables  are  presented  in  the  econometrics 
literature  (cf.  Johnston,  1972,  pp.  312-313). 

Several  procedures  are  available  for  removing  the  serial  correlation  from 
the  disturbance  terms,  including  first-order  differencing,  generalized  differ- 
encing, and  generalized  least  squares  estimation.  The  generalized  differencing 


1 


I 
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and  generalized  least  squares  procedures  are  approximately  equivalent  for 
first-order  serial  correlation  (Johnston,  1972);  the  former  technique  is  briefly 
described  here  and  the  reader  is  referred  to  econometrics  texts  for  more  exten- 


sive treatments  of  both  procedures.  In  general,  the  generalized  differencing 
process  replaces  £,  which  is  generally  unknown,  with  an  estimated  value 
(processes  for  estimation  are  not  discussed  here),  and  then  replaces  each  term 
in  the  structural  equation  with  a difference  score  based  upon  the  estimated  £ 


times  a first-order  lagged  variable.  For  example,  equation  17  would  be 

yit  “ £ll  ylt-l  ” ^12  (y?,t  " £U  y2t-l)  + fl3  (y2t-l  “ 

A A A 

y2t-2>  + ci4  (yu-i  - Pn  yit-2>  + (lit  - £u  dit-i) 

where  pjj  represents  an  estimate  of  p^  (a  population  term),  and  an 
additional  wave  of  data  would  have  to  be  obtained  (i.e.,  Yjt_2  and 

Y2t-2>- 

A 

As  discussed  later,  if  pj^  is  a "good”  estimate  of  p^,  then  the  distur- 
bance term  for  the  above  equation  would  be  e^t,  the  random  error  component. 

A 

That  is,  d^t  - p^^  dlt-l  **11  be  equal  to  e^t,  and  serial  correlation  will 
have  been  removed  from  the  disturbance  term. 

12.  If  the  disturbance  terms  are  serially  correlated,  then  the  lagged 
values  of  the  endogenous  variables  will  be  correlated  in  the  limit  with  the 
disturbance  terms  for  both  equations  17  and  18.  That  is 

plim  a h$=l  * pllm  (|  hlrl  f2t>  • plim  (l  111=1  ^2t>  • and 
Plim  (^  y2t-l  dlt^  are  a^1  not  e9ual  to  0« 
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For  example,  unmeasured  causes  of  ^ are  related  to  unmeasured  causes  of 


Y through  the  paths  Y 
lt  6 K lt-1  x 


d In  Figure  5,  thus 


building  in  a correlation  between  Y^^  and  d.^.  Moreover,  following  the 
above  logic^  paths  can  be  used  to  show  a relationship  between  Y^  ^ and  d 


(or  Y and  d ) . 
2t-l  It 


13.  It  will  be  assumed  that  following  the  generalized  differencing 

process  discussed  above,  the  lagged  values  of  the  endogenous  variables 

(Y„  , Y.  , s " 1.  . .S)  will  not  be  correlated  in  the  limit  with  either 

It— s 2t-s  — — 

e or  (Fair,  1970).  This  provides  a basis  for  consistent  and  asympo- 

tically  efficient  estimation  of  the  structural  parameters,  although  the 
estimates  will  be  biased  in  small  samples  (Johnston,  1972).  However,  due  to 
the  nonrecursive  nature  of  the  model,  the  conditions  described  in  assumption 
10  for  the  correlations  between  the  current  values  of  the  endogenous  vari- 
ables and  the  disturbance  terms  will  still  be  in  effect  for  and  e0  (e.g., 

Y„  will  still  be  correlated  in  the  limit  with  e,  ), 

2t  It 

In  summary,  OLS  estimates  of  the  structural  parameters  for  equations  17 
and  18  will  be  biased  and  inconsistent  because  the  model  is  nonrecursive,  the 
disturbances  are  serially  correlated,  and  the  predetermined  variables  are 
correlated  with  the  disturbances.  In  addition,  as  noted  earlier,  the  struc- 
tural equations  are  underidentified.  The  identification  problem  is  addressed 
first,  followed  by  a discussion  of  a procedure  presented  by  Fair  (1970)  for 
obtaining  consistent  and  asympotically  efficient  estimates  of  the  structural 
parameters  using  a modified  version  of  2SLS  (denoted,  following  Fair  [1970] 
and  Amemiya  [196b],  as  S2SLS) . 
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The  Identification  question  is  at  the  heart  of  applying  structural 
equations  to  XLPC  designs;  it  is  desired  to  not  only  exactly  identify  or 
overidentify  the  equations,  but  to  include  all  major  causal  variables  in  the 
equations,  particularly  those  providing  spurious  relationships  (e.g.,  syn- 
chronous or  cross-lagged  common  factors  [cf.  Kenny,  1973]).  As  discussed 
earlier,  not  all  causal  variables  will  likely  be  included,  thus  creating  a 
specification  error.  However,  an  emphasis  on  including  multiple  sources  of 
relevant  causality  provides  a more  explanatory  theoretical  network  as  well 
as  an  opportunity  to  test  competing  hypotheses  in  overidentified  models. 

For  exemplary  purposes,  only  enough  exogenous  variables  were  added  to 
equations  17  and  18  to  exactly  identify  the  equations.  This  involved  adding 
variables  X and  X2t  to  equation  17,  and  variables  X2{.  and  X^t  to  equation 
18.^  The  exogenous  variables  were  assumed  to  be  independent  of  the  distur- 
bance terms  in  each  equation.  The  new  equations  are  (in  deviation  form) 

ylt  " b12  y2t  + C11  Xlt  + c12  X2t  + C13  y2t-l  + c14  ylt-l  + 8lt  (19) 


y„  ■ b y.  + c x+cx+cy  +cy  + g 
2t  21  lit  _22  2t  23  3t  24  lt-1  25  2t-l  2t 


(20) 


where  and  g2t  are  the  disturbance  terms;  serial  correlation  between 
the  disturbance  terms  is  represented  by  g^t  » p^  ^it-l  + h^t  and 
82t  = ^22  82t  1 + b2t’  SnC*  previous  probability  limits  remain  the 
same  with  d replaced  by  £,  and  t_  replaced  by  h. 

The  application  of  S2SLS  proceeds  in  the  following  manner.  To  conserve 
space,  the  equation  for  y^t  received  focus.  First,  the  generalized  differ- 
encing process  is  applied  to  equation  19  in  order  to  replace  serially  corre- 
lated g^t  with  the  random  component  hjt. 
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yit  - >11  yit-i  • bn  <^2t  - 1 W * ‘M  (xn  - 1 w 

A A 

+ c (x  -p  x ) + c (y  -p  y ) 

12  2t  *11  2t-l  13  2t-l  11  2t-2 

A A 

+c  (y  — p y )+  f (p  - p ) e + h ] (21) 

14  ylt-l  _n  It- 2 .11  111  Slt-1  _ltJ  ' 

A 

where  p^  is  an  estimate  of  P^’  the  disturbance  term  will  equal 
if  an  appropriate  least  squares  estimate  of  Pjj  is  obtained 

(discussed  later),  and  an  additional  wave  of  data  must  be  collected 

for  the  endogenous  and  exogenous  variables. 

The  second  step  involves  the  first-stage  of  S2SLS.  To  visualize  the 
development  of  the  reduced  form,  it  should  be  noted  that  the  only  variable 
in  equation  21  correlated  with  the  new  error  term  (h^t)  in  the  limit  is  y , 
which  is  due  to  the  nonrecursiveness  of  the  equation.  All  other  variables 
in  equation  21  are  predetermined  (i.e.,  lagged  endogenous,  cross-lagged 
endogenous,  and  current  and  lagged  exogenous),  and  are  not  correlated  with 
(hit)  in  the  limit  as  a result  of  either  assumptions  or  the  generalized 
differencing  process.  Thus,  a reduced  form  is  needed  to  obtain  a predicted 
score  for  y2^  based  upon  all  predetermined  variables  in  equation  21  and  the 
predetermined  variables  that  would  be  obtained  from  applying  the  generalized 
differencing  process  to  equation  20. 

In  general  terms,  using  reduced  form  estimated  parameters,  the  reduced 
form  for  y2ti  is 

A A A A A A 


A 

A 

A 

A 

A 

= *21  y2t-l 

+ n22 

y2t-2  + *23  ylt-l 

+ 124  y_lv 

-_2  + *25  . 

A 

A 

A 

A 

A 

+ "26  Xlt-1 

+ *27 

X2C  * "28  x2t-l  + 

*29  X3t  + 

*210  x3t 
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where  the  predicted  scores  for  y are  direct  functions  of  the 
predetermined  variables,  and  m2t  *■  (b2^  hlt  + h2t>  / U/[l-b2i  b12]). 

OLS  provides  the  predicted  y2t  scores. 

A 

The  third  step  in  the  procedure  is  to  replace  (y2t  - Pj_1  y2t_i)  with 

A A , A 

(y2t  - p.^  y2t-l^  in  e9uat*on  21,  where  plim  (i  y2t  h^fc)  i6  now  equal  to 
zero.  If  y2{.  “ y2t  is  set  equal  to  v^,  the  new  disturbance  term  for  equa- 
tion 21  for  the  second-stage  regression  is  equal  to  [ (p^  - p^)  g^t  + 

A ~ — 

hit  + ^12  vlt^*  8econd-8tage  of  2SLS  is  then  conducted  using  OLS. 

However,  because  p^  can  only  be  estimated,  several  OLS  analyses  are  con- 

A 

ducted,  using  values  of  p^  varying  between  1 and  -1  (or  an  iterative  proce- 
dure is  used).  The  OLS  analysis  with  the  estimated  value  of  p2j  which  yields 
the  smallest  sum  of  squared  residuals  of  the  second-stage  regression,  and 
the  corresponding  estimates  of  the  structural  parameters,  is  selected  as  a 

solution  (an  iterative  procedure  is  provided  by  Fair  [1970,  p.  509]). 

A 

The  minimum  sum  of  squared  residuals  occurs  where  equals  p (in 

A 

large  samples),  leaving  the  error  term  (h2t  + b12  v22).  which  has  a zero 

__  - A 

expected  value  and  limit  is  uncorrelated  with  y->t  as  well  as  with  the 

predetermined  variables.  That  is,  the  predetermined  variables  are  neither 

A 

correlated  with  h^t>  for  reasons  already  discussed,  nor  with  b^.,  v2t  because 

they  were  used  as  predictors  in  the  first-stage  of  S2SLS . The  values  for 

A 

y2t  are  uncorrelated  with  the  disturbance  term  in  the  second-stage  of  S2SLS 

based  on  the  logic  presented  earlier  for  2SLS.  (Another  reason  for  employing 

lagged  endogenous,  cross-lagged  endogenous,  and  current  and  lagged  exogenous 

variables  in  the  first-stage  regression  is  to  insure  the  orthogonality  of 

A 

Vjt  and  g2t_2,  which  is  necessary  if  the  minimum  sum  of  squared  residuals  is 


A 
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A 

to  occur  where  equals  p^j  [Fair,  1970,  p.  509]),  Thus,  the  S2SLS 
procedure  provides  consistent  estimates  of  the  structural  parameters. 

For  tests  of  goodness  of  fit,  one  would  have  to  begin  with  overidenti- 
fied structural  equations,  and,  using  the  logic  presented  for  nonrecursive 
models,  add  predetermined  variables  to  the  equations  until  exact  identifica- 
tion was  achieved.  The  S2SLS  procedure  would  then  he  repeated  on  the  exactly 
identified  equations  and  the  resulting  parameter  estimates  examined  to  ascer- 
tain if  they  diverged  from  the  assumed  causal  model. ^ 

In  summary,  the  application  of  structural  equations  and  S2SLS  to  the 
XLPC  design,  or  more  generally  to  models  involving  lagged  endogenous  variables 
and  nonrecursive  relationships,  is  a rather  complex  process,  complexity  being 
interpreted  in  terms  of  the  theoretical  system  that  is  required,  the  assump- 
tions that  must  be  made,  the  amount  of  data  that  must  he  collected,  and  the 
statistical  procedures  that  are  necessitated.  For  example,  at  least  three 
waves  of  data  must  be  collected  (which  includes  the  exogenous  variables  if 
lagged  values  of  such  are  included  in  the  equations  prior  to  differencing). 

The  variables  must  be  highly  reliable  for  reasons  of  parameter  estimation, 
the  calculation  of  difference  scores  (cf.  Cronbach  & Furby,  1970;  Lord  & 
Novick,  1968),  and  the  possibility  that  the  measurement  errors  of  unreliable 
variables  could  be  serially  correlated  over  time  (cf.  Namboodiri  et  al., 

1975;  Pindyck  & Rubinfeld,  1976),  Furthermore,  the  inclusion  of  lagged 
variables  in  the  equations  may  well  present  a problem  of  mul t icol  linear  ity 
(cf . Johnston,  1972).  Thus,  while  multiple  lags  are  desirable  to  analyze 
such  possibilities  as  positive  or  negative  feedback  loops  (cf.  Miller,  1971; 
Pelz  & Lew,  1970),  the  addition  of  lagged  variables  to  the  equations  may  be 
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dysfunctional  from  another  standpoint.  Finally,  the  time  lags  and  the  model 
in  general  must  be  correctly  specified,  both  of  which  are  sizeable  require- 
ments. For  example,  the  model  may  not  be  first-order  autoregressive  as 
assumed  here,  nor  may  the  stationarity  assumption  be  viable  (see  Pindyck  and 
Rubinfeld  [1976]  for  time-series  procedures  that  might  be  employed  when  these 
assumptions  are  violated).  A related  concern  is  differing  stabilities  of  the 
causal  factors  over  time,  where  for  example  differing  degrees  of  stability 
(short-term  versus  long-term)  have  different  effects  on  the  magnitude  and 
even  the  sign  of  the  parameter  estimates,  especially  if  the  measurement  periods 
do  not  correspond  to  the  causal  intervals  (cf.  Fisher,  1971;  Pelz  f.  Lew,  1970). 

Unfortunately,  space  does  not  permit  further  consideration  of  the  above 
issues,  and  the  reader  is  referred  to  the  cited  references  for  additional 
reading.  This  section  is  concluded  by  simply  reiterating  that  the  XLPC 
design,  or  some  variation  thereof,  is  perhaps  the  most  likely  candidate  at 
the  present  time  in  psychology  for  analyzing  designs  which  include  lagged 
endogenous  variables  and  recursive  relationships.  This  should  not  be  con- 
strued to  mean  that  the  XLPC  is  a nondemanding  procedure.  The  truth  of  the 
matter  is  that  XLPC  designs  are  quite  demanding  with  respect  to  assumptions 
and  data,  although  not  as  demanding  as  structural  equations.  Nevertheless, 
the  development  of  appropriate  structural  models,  which  may  include  nonre- 
cursive relationships,  is  a desirable  goal  for  psychology  because  they  provide 
a stronger  theoretical  and  causal  foundation, 

DISCUSSION 

The  major  goal  of  the  present  report  has  been  to  introduce  psychologists 
to  the  rationale,  assumptions,  and  analytical  procedures  of  2SLS  and  its 
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applications  to  selected  structural  equations.  These  include  nonrecursive 
structural  equations,  structural  equations  which  contain  predictors  with 
random  measurement  error,  and  structural  equations  which  involve  lagged 
values  of  endogenous  variables.  In  addition,  nonrecursive  relationships 
were  included  in  the  last  application  in  order  to  increase  generality  and 
to  demonstrate  how  two  of  the  applications  could  be  combined.  The  first  and 
last  applications,  however,  assumed  perfectly  reliable  variables  (or,  from 
a pragmatic  standpoint,  highly  reliable  variables).  If  this  assumption  is 
not  met,  then  a procedure  such  as  outlined  in  the  second  application  (i.e., 
the  use  of  instrumental  variables)  could  be  added  to  the  analysis,  although 
the  instrumental  variable  approach  is  not  a panacea  for  variables  with  large 
measurement  errors. 

The  selected  applications  were  considered  to  be  reflective  of  many 
psychological  phenomena.  In  particular,  nonrecursive  models  may  add  a new 
dimension  of  analysis  to  the  current  Zeitgeist  of  interactionism  in  psycho- 
logy. Moreover,  in  the  presence  of  strong  theory,  hopefully  based  in  part  on 
previous  research,  the  use  of  cross-sectional  data  should  not  preclude  the 
development  of  at  least  tentative  causal  models  if  in  fact  assumptions  have 
been  reasonably  met.  Such  research  can  provide  a strong  foundation  on  which 
to  proceed  to  dynamic  models,  which  presumably  provide  a stronger  test  of 
the  model.  It  is  of  utmost  Importance  to  note  that  a crucial  issue  in  the 
use  of  dynamic  models  and  lagged  variables  is  the  degree  to  which  the 
measurement  of  variables  corresponds  to  real-world  temporal  sequences  and 
time  lags.  Furthermore,  assumptions  such  as  the  equilibrium-type  condition 
are  maintained  in  the  dynamic  analyses.  That  is,  the  stationarity  assumption 
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of  cross-lagged  panel  correlation  requires  that  the  causal  process  is  in 
equilibrium  (i.e.,  the  structural  equation  for  a variable  is  invariant  with 
respect  to  time  of  measurement).  However,  if  the  assumptions  for  dynamic 
analysis  are  met,  opportunities  are  provided  to  address  and  to  test  directly 
several  issues  which  are  typically  assumptions  and  not  wholly  testahle  in 
cross-sectional  analyses.  These  Issues  include  the  source  and  direction  of 
causation,  the  necessity-sufficiency  of  causation,  and  dynamic-static  causal 
relationships  (cf.  Feldman,  1975).  On  the  other  hand,  if  assumptions  regar- 
ding dynamic  analyses  cannot  be  met,  analyses  based  on  cross-sectional  data 
may  provide  the  more  meaningful  results,  particularly  if  an  equi librium-type 
condition  exists  and  the  unmet  assumption  for  dynamic  analysis  is  measurement 
corresponding  to  real-world  temporal  sequences. 

In  conclusion,  causal  analyses  that  employ  structural  equations  and 
passive  data,  either  lagged  or  nonlagged,  have  as  a primary  focus  the  identi- 
fication of  untenable  causal  models  rather  than  the  identification  of  a 
"true"  causal  model.  This  requires  the  use  of  overidentified  models  so  that 
different  models  may  be  tested,  and  emphasis  is  placed  on  conceptual  issues, 
rationale,  and  assumptions,  and  the  internal  consistency  of  results  with 
respect  to  the  theory,  rationale,  and  assumptions.  The  strength  of  these 
models  lies  in  nonexperimental  inference  (Kenny,  1975),  even  though  problems 
concerning  causal  inferences  associated  with  passive  data  obtained  from 
natural  observation  and  thus  lacking  randomization  and  control  are  well  known. 
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^"The  equations  presented  in  this  paper  employ  unstandardized  variables 
and  unstandardized  regression  weights  as  estimates  of  (unstandardized) 
structural  parameters.  The  use  of  standardized  variables  and  their  corres- 
ponding "beta"  weights  has  been  quite  popular  because  structural  equations 
can  then  be  addressed  in  a "path  analysis"  paradigm.  However,  we  have 
chosen  to  employ  unstandardized  variables  because  standardization  of  vari- 
ables might  obscure  distinctions  between  estimates  of  structural  parameters 
and  the  var iances-covariances  that  describe  joint  distributions  of  variables 
in  a population.  That  is,  if  it  is  possible  for  a variable  to  have  different 
distributions  for  reasons  such  as  the  use  of  different  populations  or  changes 
in  a particular  population  over  time,  then  unstandardized  regression  weights 
should  be  employed  because  they  are  still  comparable  across  the  distributions 
while  standardized  regression  weights  are  not  (cf.  Namboodiri  et  al.,  1975; 
Wiley  & Wiley,  1971). 


Two-Stage  Least  Squares 


70 

Footnotes  (cont'd) 

2 

The  authors  would  like  to  thank  an  unknown  reviewer  for  pointing  out 
that  a sufficient  condition  for  equilibrium  would  be  "to  allow  some  movements 
of  the  dependent  variable  values  if  these  are  compensated  by  the  inverse 
movement  of  the  dependent  variables  by  other  individuals  with  similar  values 
on  the  exogenous  variables". 

The  total  number  of  alternative  causal  models  that  might  be  tested  in 
many  designs  could  be  considered  infinite.  In  this  sample,  we  have  only 
considered  the  causal  relationships  among  observables,  with  the  assumptions 
associated  with  nonrecursive  models  (e.g.,  correlated  disturbances).  How- 
ever, the  introduction  of  nonobservables,  the  possibility  serially  correlated 
disturbances,  and  so  forth  could  greatly  extend  the  complexity  of  the  model 

as  well  as  competing  causal  hypotheses.  On  the  other  hand,  the  number  of 
competing  causal  hypotheses  is  usually  reduced  depending  on  the  causal 
closure  and  the  nature  of  theoretical  orientation.  That  is,  if  there  is  a 
compelling  reason  to  believe  that  there  are  only  a few  causal  structures 
that  would  be  meaningful,  there  is  no  reason  to  test  all  possible  permuta- 
tions and  combinations  which  may  emerge.  For  example,  if  one  is  fairly  sure 
of  the  fact  that  there  is  an  asymmetric  causal  relationship  which  will  hold, 
the  number  of  alternative  models  are  automatically  reduced  and  would  not 
require  further  testing. 

^There  is,  of  course,  the  possibility  of  recursive  relationships,  which 
include  the  leader  causing  subordinate  behnviors,  the  subordinate  causing 
leader  behaviors,  and  various  feedback  loops  with  known  time  laRs.  To  avoid 
complexity,  we  have  not  addressed  these  possibilities  here. 
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-*The  additional  inclusion  of  lagged  values  of  the  exogenous  variables 
would  present  a more  realistic  example;  however,  we  are  attempting  to  pro- 
vide a general  introduction  and  to  minimize  complexity.  On  the  other  hand, 
as  will  be  shown,  lagged  values  of  the  exogenous  variables  will  enter  into 
the  analysis. 

^As  noted  earlier,  one  reason  for  employing  a nonrecursive  model  in  the 
present  discussion  was  that  a reciprocal  relationship  between  the  dependent 
variables  provided  a competing  hypothesis  for  the  XLPC  design,  which  tradi- 
tionally has  been  viewed  as  an  asymmetric  causal  model.  However,  the  S2SI.S 
and  goodness  of  fit  tests  may  indicate  that  the  structural  model  is  recursive 

rather  than  nonrecursive  (i.e.,  the  nonrecursive  relationships  are  not  empir- 
ically substantiated).  In  this  circumstance,  or  in  cases  where  nonrecursive 
relationships  can  be  ruled  out  a priori,  a different  application  of  2SLS  may 
be  employed  when  the  XLPC  design  is  viewed  in  terms  of  structural  equations 
(or  in  more  general  terms,  when  the  model  is  recursive  and  includes  lagged 
endogenous  variables  with  serially  correlated  disturbances) . As  described 
by  Johnston  (1972)  and  Wallis  (1967),  this  procedure  involves  replacing  the 
lagged  endogenous  variables  with  instruments,  based  on  estimates  provided  by 
lagged  values  of  exogenous  variables,  and  then  applying  a generalized  least 


squares  estimation  procedure  to  compute  the  second-stage  regress! one . 


Figure  1 
Figure  2 
Figure  3 

Figure  4 
Figure  5 
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Figure  Captions 

Graphic  illustration  of  a recursive  model. 

Graphic  illustration  of  a nonrecursive  model. 

A nonrecursive  model  incorporating  exogenous  variables 
and  disturbance  terms. 

Cross-lagged  panel  correlation  design. 

Cross-lagged  panel  correlation  design  viewed  in  terms 


of  a structural  model  with  three-waves  of  data. 
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Appendix  A:  Development  of  a Reduced  Form  in  2SLS 

The  reduced  from  for  a set  of  equations  is  obtained  by  a process  of 
substitution  in  which  the  endogenous  va  iables  appearing  in  a particular 
equation  as  predictors  are  replaced  by  the  right-side  of  their  respective 
structural  equation.  The  solution  for  the  value  of  each  endogenous  variable 
is  then  obtained  in  terms  of  the  exogenous  (predetermined)  variables  only, 
plus  a disturbance  term.  For  example,  in  the  exactly  identified,  nonrecur- 
sive equations 

yl  - b12  y2  + cn  xx  + dj  (A.l) 


y2  ' b21  yi  + c22  X2  + d2 


(A. 2) 


y^  may  be  expressed  as  a function  of  the  predetermined  variables  (x^  and  x2) 
and  a disturbance  term  by  substituting  the  right-side  of  equation  A. 2 in 
equation  A.l  to  replace  y2.  The  reduced  form  for  y^  is  therefore 

*1  “ b12  <b21  *1  + c 2 2 x 2 c 1 1 x i 


y_i  “ bi2 

+ c22  *2  + 

_d2 

) + fn^i 

-iw 

^21  ll 

+ ^12  f22  ^2 

+ 

^12  A_i  + ; 

- 

1 

^b12  c22  x2 

+ 

fn^i  + 

+ d12  a2J 


(1-b12  b21> 


In  more  general  terms,  the  reduced  form  for  the  y and  y9  equations  is 


viewed  as 

A A 


yl  " ”ll  *1  + ^12^2  + ^1 

A A A 

y2  = tt21  + *22  x2  + m2 


(A. A) 


(A. 5) 
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/\  A 

where  and  represent  the  predicted  values  of  the  endogenous 

~ A 

variables,  tt^  represent  the  unbiased  estimates  of  the  population 

reduced  form  parameters  (ttr^)  , and  the  represent  the  disturbance 

terms  for  the  reduced  form. 

A A 

The  predicted  values  and  ^2  are  obtained  by  simply  applying  OLS  to 

~~  ~ A 


equations  A. A and  A. 5,  which  also  provides  the  estimates  . Of  interest, 
however,  is  the  fact  that  the  (estimated)  reduced  form  parameters  are  exact 
nonlinear  functions  of  the  (estimated)  structural  parameters,  and  vice-versa 
(Duncan,  1975).  For  example,  in  equation  A. A is  equal  to  (bj^  c22  xt)  / 
(l/[l-b^2  b 2 ] ) - 


Appendix  B:  Matrix  Algebra  for  Applying  2SLS  to 

Nonrecursive  Structural  Equations 
and  the  Rank  Condition 

As  presented  by  Johnston  (1972),  a particular  equation  selected  from  a 

set  of  simultaneous,  nonrecursive  equations  may  be  viewed  as 

y = Y-.  b + X.  c + d,  (B.l) 

—A  1 " 

-VA- 

where 

y is  an  n x 1 vector  of  observations  (raw  scores)  on  the  dependent 
variab le, 

Y^  is  an  n x £ matrix  of  observations  on  mutually  interacting  endogenous 
variables  included  in  the  equation, 

b is  a g x 1 vector  of  estimated  structural  parameters  attached  to  the 

/W 


Y^  variables, 
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is  an  n x k matrix  of  observations  on  the  exogenous  (predetermined) 
variables  in  the  equation  (a  column  of  ones  is  included  if  an 
intercept  is  required), 

cf^is  a k a 1 vector  of  estimated  structural  parameters  for  the  X^ 
variables,  and 

d^ is  an  n x 1 vector  of  disturbances  for  this  equation. 

It  is  assumed  that 


■.  / i * 

plim  (~  X d}  - 0,  while  plim  (_  Y^  d_)  + 0 


The  first  stage  regression  or  reduced  form  is 

A / - / 

Y.  « X (X  X)-1  X Y 


(B.2) 


where  X « which  is  equal  to  the  n x k matrix  of  observations 

on  all  exogenous  (predetermined)  variables,  given  that  X,,  is  the  matrix 
of  observations  on  the  exogenous  (predetermined)  variables  not  induced 
in  the  equation  under  study. 

A 

The  second-stage  of  2SLS  involves  regressing  y on  Y-^  (which  replaces  Y^)  and 
X^.  The  estimating  equations  for  this  analysis  are 


A/  A A / 

Y1  Y1  Y1  X1 


/ A / 

X1  Y1  X1  X1 


A f 

Yi  y 


(B.  3) 


where  b and  c are  estimates  of  the  population  structural  parameters 

/vv  -vw 

based  on  2SLS. 

The  necessary  and  sufficient  rank  condition  for  identification  can  be 
shown  by  designating  X0  as  a column  vector  of  exogenous  variables  excluded 


from  equation  B.l  (for  derivation  purposes,  the  references  to  observations 
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will  be  deleted,  and  population  values  are  assumed).  The  equations  for  the 
reduced  form  corresponding  to  the  variables  only  are  then  (Fisher,  1966, 


p.  52) 
A 


11  12 

£7  X!  + tt  X2  + V1 

•'V'W  ■vv' 


where  is  a column  vector  of  exogenous  variables  included  in  the 

n 12 

equation,  H and  II  represent  (population)  reduced  form  parameters, 

and  v^  is  a column  vector  of  reduced  form  disturbances. 

A necessary  and  sufficient  condition  for  identification  of  equation  B.l  is 
12 

that  the  rank  of  be  equal  to  be  number  (jr)  of  endogenous  variables  inclu- 
12 

ded  in  Y^  has  r_  rows  and  K - _f  columns,  where  f_  is  equal  to  the  number 

of  exogenous  variables  in  Xj)  (Fisher,  1966,  p.  54). 
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